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I.  INTRODUCTION 

Digital  processing  of  speech  signals  has  become 
important  and  necessary  with  the  introduction  of  high-speed 
digital  devices  into  every  phase  of  communication:  place  to 
place;  man  to  machine;  and  machine  to  man. 

Digital  signals  have  a  number  of  inherent  advantages 
over  analog  signals.   Digital  signals  may  be  coded  for 
security  or  for  noise  immunity.   A  digital  voice  signal  may 
be  transmitted  by  the  same  equipment  used  for  data   and  it 
may  be  multiplexed  with  that  data.   One  of  the  primary 
disadvantages  of  the  digital  transmission  of  voice  is  the 
large  bandv/idth  required  with  some  digital  techniques. 
When  analog  techniques,  such  as  single  side-band  amplitude 
modulation,  produce  bandwidths  of  5KHz  and  the  best  digital 
system  bandwidth  was  6^+khz,  there  was  a  very  strong 
tendency  to  stay  with  the  analog  techniaues. 

However,  recent  advances  in  digital  signal  processing 
have  made  the  digital  transmission  of  voice  highly 
efficient.  Until  recently  digital  trarvsmi  ss  i  on  of  soeech 
was  possible  only  by  sampling  the  voice  waveform  at  a 
sufficiently  high  rate  and  then  performing  an 
analog-to-digital  conversion  of  each  sample.  A  sufficient 
number  of  bits  were  transmitted  for  each  sample  which  was 
sent  to  reconstruct  the  waveform  at  the  reciever.   The 
voice  waveform  must  be  sampled  at  aproximately  8,000 


samples  per  second  to  avoid  the  loss  of  clarity.   Each  of 
the  samples  must  then  be  converted  to  a  6-10  bit  number  for 
transmission.  The  overall  data  rate  using  these  methods  had 
a  lower  limit  in  the  neighborhood  of  48,000  bits  per 
second . 

Recent  developments  have  allowed  the  voice  pattern   to 
be  broken  down  into  more  basic  parameters  which  are    closely 
associated  with  the  physical  production  of  speech.  These 
parameters  vary  rather  slowly  and  can  be  transmitted  at  a 
lov/er  rate.  Data  rates  as  low  as  1200  bits  per  second  have 
been  achieved  through  the  use  of  these  techniques. 

These  methods  are    numerical  representations  of  the 
physical  production  of  speech,  and  therefore  it  is  easier 
to  alter  the  characteristics  of  speech  by  altering  the 
associated  parameters  then  by  trying  to  alter  the  waveform 
d  i  recti y . 

This  thesis  reviews  various  digital  speech  processing 
techniques  for    use  in  a  speech,  modification  system.  Linear 
predictive  coding  (LPC)  v/as  chosen  for  implementation  and 
therefore  the  theory  and  practice  of  this  techniaue  are 
explained  in  detail.   The  desired  modification  of  the 
speech  waveform  by  shifting  the  poles  of  its  characteristic 
polynomial,  and  the  regeneration  of  the  altered  waveform 
are  discussed  and  the  implementation  techniques  explained. 
The  IBM  360  computer  was  used  for  simulating  the  techniaues 
developed.  This  simulation  is  covered  in  detail  and  the 
computer  programs,  with  results,  are    provided. 


II.  SPEECH  PRODUCTION  AND  CHARACTERISTICS 

Any  digital  system  for  altering  speech  characteristics 
must  be  based  on  knowledge  of  those  characteristics  and  the 
physical  structure  which  determines  them. 

A.   SPEECH  CHARACTERISTICS 

All  speech  can  be  broken  down  into  a  set  of  distinctive 
sounds  called  phonemes,  in  the  case  of  American  English, 
there  are  generally  considered  to  be  k2    distinct  phonemes 
which  are    classified  into  vowels,  diphthongs,  semivcwels 
and  consonants.   Spoken  communication  is  accomplished 
through  various  combinations  of  these  sounds  and  the 
accurate  reproduction  of  each  is  a  major  criteria  in 
judging  voice  processing  systems.   Phonemes  are  generated 
at  a  rate  of  about  ten  per  second.   Each  phoneme  is 
classified  as  voiced  if  vocal  cord  vibration  is  the  source 
of  the  sound  or  unvoiced  if  the  sound  is  produced  by  other 
means.   If  the  characteristics  of  a  phoneme  change  from  the 
start  to  finish,  the  phoneme  is  called  noncont i nuant .  Those 
phonemes  which  are    stationary  are    called  continuant. 

The  lowest  frequency  present  in  a  given  voiced  sound  is 
called  the  pitch  frequency.  There  are    peaks  in  the  spectral 
representation  of  a  speech  sound  that  are    above  the  pitch 
frequency  which  are    called  formants  and  are  numbered 
consecutively  with  increasing  frequency.   Although  two 
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speakers  may  produce  the  same  phoneme,  the  pitch  and 
formant  frequencies  may  be  different.  However,  general 
relationships  may  be  established  between  pitch  and  formant 
frequencies  which  are    relatively  constant  from  speaker  to 
speaker,  producing  the  same  phoneme.   If  information  is  to 
be  retained  by  a  speech  processing  system,  it  must  be  able 
to  reproduce  at  output,  the  pitch  and  formant  frequency 
relationship  which  was  present  at  the  input. 

B.   PHYSICAL  SPEECH  PRODUCTION  STRUCTURE 

The  vocal  tract  is  a  resonant  tube  with  the  vocal  cords 
at  one  end  and  the  lips  at  the  other.   The  vocal  tract  acts 
as  a  frequency  selective  filter  which  has  a  transfer 
function  that  depends  on  how  it  is  shaped  at  any  given 
t  i  me  . 


(A)  VOICED 
FIGURE     1. 


(B)    UNVOICED 
SOUND    PRODUCTION 


The  input  to  the  vocal  tract  is  caused  by  either  the 
vibration  of  the  vocal  cords  at  the  lower  end  (figure  l.a) 
or  by  the  turbulence  of  air  being  forced  through  a 
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constriction  at  any  of  a  number  of  locations  along  the 
vocal  tract  (figure  l.b).  The  vocal  tract  acts  as  a  filter 
with  a  pulsed  input  from  the  vocal  cords  when  producing 
voiced  sounds  such  as  'a'  or  'o'.   During  sounds  caused  by 
the  forcing  of  air  through  a  constriction,  fricative  sounds 
like  's'  or  'f,  the   vocal  tract  acts  as  a  resonant  cavity 
which  will  have  certain  characteristic  response 
frequencies.  Typical  waveforms  for  voiced  and  unvoiced 
sounds  are    shown  in  figure  2. 


VOICED 


**HW*-+*^^ 


FIGURE     2 


UN  VO  (CED 
TYPICAL     WAVEFORMS 


Certain  characteristics  of  the  vocal  tract  are    changed 
several  times  per  second  to  produce  different  sounds  while 
others  such  as  overall  length  and  the  diameter  range  limits 
are  fixed  for  a  given  speaker.   A  detailed  look  at  each  of 
the  types  of  sounds  will  insure  that  the  digital  processor 
used  has  the  same  flexibility  as  the  actual  speaker. 

Vowels,   voiced  continuant  sounds,  are  produced  when 
the  vocal  cords  vibrate  causing  pulses  of  air  at  the  bottom 
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of  the  vocal  tract.  The  shape  of  the  vocal  tract  remains 
fixed  during  vowel  production,  acting  as  a  stationary 
filter  to  respond  to  the  forcing  function. 

The  production  of  diphthongs  and  semivowels  is  similar 
to  that  of  vowels  except  that  the  shape  of  the  vocal  tract 
is  smoothly  changed  during  voicing.  Diphthongs  and 
semivowels  are    noncon t i nuan t,  voiced  sounds. 

The  phonemes  classified  as  consonants  may  actually  be 
further  divided  into  subca tagor i es  of  voiced  fricatives, 
unvoiced  fricatives,  stops  and  nasals.   Fricatives  are 
caused  by  the  steady  flow  of  air  through  a  constriction  in 
the  vocal  tract  wh i ch  causes  turbulant  air  motion  and  a 
seemingly  random  air  pressure  pattern.   Fricatives  are 
voiced  or  unvoiced  depending  on  whether  the  vocal  cords  are 
producing  pressure  pulses  at  the  same  time.   Stops  or 
plosives  are    caused  by  completely  closing  the  vocal  tract 
and  then  suddenly  opening  it  to  quickly  start  sound 
production.   A  stop  is  classified  as  voiced  or  unvoiced 
depending  on  the  nature  of  the  sound  that  follows  the 
opening  of  the  vocal  tract.  Nasals  are    voiced  sounds  which 
are    formed  when  the  vocal  tract  is  closed  and  air  is 
allowed  to  pass  through  the  nasal  cavity.   This  acts  as  a 
feed  forward  path  for  the  sound  and  a  corresponding  change 
is  caused  in  the  total  vocal  tract  response. 

C.   INFORMATION  CONTENT 

One  of  the  primary  goals  of  speech  processing  is  the 
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development  of  efficient  codes  for  transmitting  or  storing 
speech  and  still  allowing  it  to  be  reconstructed  without 
excessive  loss  of  information.  The  source  coding  theorem 
states  that  through  the  proper  choice  of  coding  we  can  code 
a  source  into  a  bit  sequence  arbitrarily  close  in  length  to 
the  entropy  of  that  source.  However,  efficient  codes  are 
difficult  to  find  for  even  simple  binary  sources,  let  alone 
a  continuous  speech  source.  An  estimation  of  the  entropy  of 
a  typical  speech  source  provides  a  useful  guage  for 
measuring  the  data  rate  performance  of  any  system. 

If  'excessive  loss  of  information'  occurs  only  when  we 
don't  receive  the  correct  one  of  the  hi    phonemes,  the 
information  content  of  one  second  of  speech  is 
approximately  (assuming  10  phonemes  are    produced  per 
second ) : 

hi 
H  =  10  7"P(P;  )  ("log  P(p.  )) 

i=l 

where  P(p.)  is  the  probability  of  the  i th  phoneme.  Assuming 

further  that  each  phoneme  is  equally  likely, 

H  =  10  x  hi    x  1/hl    x  log  hi    =  54  bits  per  second 

If  the  actual  probability  of  each  phoneme  was  used,  i.e. 
they  are    not  equally  likely,  the  value  of  entropy  would  be 
significantly  lower. 

If  'excessive  loss  of  information'  also  includes 
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failure  to  Identify  the  speaker  and  failure  to  indicate  the 
speaker's  emotional  state  the  information  content  is 
higher.   However  if  we  assume  that  identification  of  the 
speaker  (one  of  about  two  billion)  is  only  reauired  once 
per  minute  and  that  the  speaker's  emotional  state  (say  one 
of  ten)  can  only  change  once  per  second  the  entropy  is 
still  only  58  bits. 

H(speaker)  =  1/60  x  10   x  1/10   x  (-log(l/10  ))  =  0.5 

H(emotion)  =  10  x  1/10  x  (-log  (1/10))  =  3.3 

H(phoneme)  =  5U  bits  per  second 

H(total)  =  58  bits  per  second 

Clearly  the  theoretical  limit  is  not  being  pushed  by  the 
current  state  of  the  art  in  speech  coding. 
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III.  DIGITAL  SPEECH  PPOCESSIMG  TECHNIQUES 

Digital  speech  processing  techniques  may  be  placed  into 
three  general  categories  based  on  the  assumptions  used  in 
their  development.   The  first  category  is  that  of  waveform 
techniques  where  the  only  primary  assumption  is  that  the 
signal  which  is  being  processed  is  frequency  limited  to  no 
more  than  half  of  the  sampling  freauency.   The  second 
category  of  spectral  methods  adds  the  assumption  that  the 
frequency  domain  characteristics  of  the  speech  waveform 
vary  slowly.   Finally,  the  voice  tract  parameter  techniaues 
assume  that  the  physical  voice  production  system  can  be 
model ed  di  g  i  tal 1 y . 

A.   WAVEFORM  METHODS 

Waveform  techniques  have  the  characteristic  of 
operating  equally  well  on  any  low-pass  filtered  waveform 
and  all  are    generally  based  on  the  familar  pulse  code 
modulation.   The  basic  requirements  of  a  waveform 
quantization  method  is  that  the  waveform  be  sampled  at 
greater  than  twice  the  highest  frequency  present  and  that 
the  samples  be  quantized  into  a  digital  code  for 
transmission.   Although  this  technique  is  very  straight 
forward,  it  also  requires  a  high  data  rate.   A  waveform 
sampled  9600  times  per  second  with  each  sample  quantized  to 
256  levels  would  require  76,800  bits  per  second  for 
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transmission.   A  number  of  variations  (differential 
modulation  and  adaptive  differential  modulation)  have  been 
used  to  reduce  the  required  data  rate  but  have  failed  to 
cut  the  required  data  rate  by  more  than  about  half. 

B.   SPECTRAL  TECHNIQUES 

1.   Short  Term  Frequency  Analysis 

These  methods  deal  with  the  short-term  freauency 
properties  of  the  speech  signal.   An  early  spectral  method 
was  the  channel  vocoder.  The  transmitting  processor  of  the 
channel  vocoder  consists  of  a  bank  of  narrow-band  analog 
filters.   The  energy  passed  by  each  filter  is  measured  and 
transmitted  to  the  receiver  site.  It  is  also  determined 
whether  the  input  speech  was  voiced  or  unvoiced  and  that 
determination  is  transmitted.   In  the  receiver  an 
excitation  signal,  determined  by  the  voicing  decision,  was 
fed  into  a  bank  of  narrow-band  filters,  each  of  which  had 
an   adjustable  gain  determined  by  the  received  energy 
measu  remen  ts . 

The  same  technique  can  be  implemented  in  an  all 
digital  method  by  replacing  the  bani<  of  analog  filters  with 
digital  filters  or  by  performing  a  discrete  Fourier 
transformation  (DFT)  on  a  frame  of  input  samples.   The  use 
of  the  DFT  is  usually  preferred  because  or  computational 
efficiency  and  the  availability  of  high-speed  DFT  array 
processors.   Normally  each  input  frame  is  windowed  to 
reduce  the  noise  which  can  be  cajsed  by  a  sharp  cut  off  at 
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the  end  of  a  frame.  When  this  method  is  used  to  reduce  the 
data  rate  required  for  digital  transmission,  the  total  DFT 
of  each  frame  is  not  transmitted  because  the  total  DFT 
would  require  the  same  number  of  bits  as  the  frame  of 
samples  (assuming  both  are    quantized  to  the  same  number  of 
levels).   Reduction  in  the  data  rate  can  be  accomplished  by 
skippirg  frames  and  assuming  they  are    duplicates  of  the 
preceeding  frame  during  reconstruction.   The  number  of 
samples  in  the  frame  is  also  half  the  number  of  frequencies 
resolved  by  the  DFT,  therefore  the  frame  length  for 
analysis  is  choosen  as  a  compromize  between  accuracy  of 
voice  reproduction  and  the  desire  for  a  low  data  rate. 

This  method  of  speech  processing  wculd  lend  itself 
well  to  altering  the  frequency  characteristics  of  voice 
signals  but  it  reauires  a  relatively  high  data  transmission 
rate  and  therefore  was  not  desirable  for  speech  orocessing 
in  conjunction  wi th  place  to  place  communications  or  with 
digitally  stored  speech. 

2 .   Homomorphic  Processing 

Another  method  which  involves  freauency  domain 
processing  is  homomorphic  processing.   It  is  based  on  zhe 
following  three  principles: 

(1)  Speech  is  the  convolution  of  an  excitation 
function  and  the  transfer  function  of  the  vocal 
tract . 

(2)  Convolution  in  the  time  domain  is  equivalent  to 
multiplication  in  the  frequency  domain. 

(3)  The  Fourier  transform  is  a  linear 
transformation,  i.e. 
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F(x(t)+y(t))  =  F(x(t))  +  F(y(t))  =  X(w)  +  Y(w) 
A  method  of  separating  a  speech  waveform  back  into  these 
components  would  help  us  analyze  the  speech.   Homomorphic 
processing  centers  around  the  efficient  deconvol ut ion  of 
these  s  i gna 1 s . 

First  the  input  signal  is  windowed  and  transformed 
via  the  DFT,  to  produce  the  frequency  domain  representation 
of  the  input  speech.  The  time  convoluticn  of  two  signals  is 
equivalent  to  multiplication  in  the  frequency  domain. 
However  knowing  the  product  of  two  waveforms  does  little 
toward  gaining  knowledge  of  the  multiplicands  unless 
further  information  is  given.   The  multiplication  of  the 
two  values  at  a  given  frequency  is  eauivalent  to  adding  the 
logarithms  of  each.   The  log  is  taken  of  each  of  the  values 
in   the  frequency  domain  representation  of  the  signal  which 
is  then  equal  to  the  sum  of  the  the  log  of  the  frequency 
domain  representation  of  the  excitation  function  plus  the 
the  log  of  the  frequency  domain  representation  of  the  vocal 
tract  function.  However,  it  is  easier  to  tell  the 
difference  between  the  vocal  tract  excitation  functions  in 
the  time  domain,  so  the  inverse  DFT  is  taken, of  the  log  of 
the  frequency  domain  function.  The  function  produced  is 
called  the  cepstrum  of  the  signal.   Because  taking  the 
inverse  DFT  is  a  linear  function,  and  the  frequency  domain 
function  was  the  sum  of  two  component  functions,  the  time 
domain  cepstrum  must  also  be  the  sum  of  the  cepstrum  of  the 
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excitation  function  and  the  cepstrum  of  the  vocal  tract 
function.  Figure  3  illustrates  the  relationship  between 
the  steps  of  homomorphic  deconvol ut i on  of  signals. 

Examination  of  the  cepstrum  between  2.5  and  20 
msec,  may  reveal  a  peak  that  is  considerably  above  the 
background  noise  level.   If  a  peak  is  there,  the  segment  is 
determined  to  be  voiced  with  the  peak  occuring  at  the  pitch 
period.  The  vocal  tract  is  not  long  enough  to  sustain  any 
vibrations   for  more  than  20  msec,  after  a  pulsed  input. 
If  there  is  no  peak  the  segment  is  considered  unvoiced. 
The  cepstrum  of  the  excitation  function  may  be  subtracted 
from  the  total  cepstrum  and  the  remainder  considered  an 
estimate  of  the  cepstrum  of  the  vocal  tract  transfer 
function.   After  working  backwards  to  magnitude  (vs.  log  of 
magnitude)  in  the  frequency  domain,  the  filter  coefficients 
may  be  determined. 

It  would  be  relatively  straight  forward  to  alter 
both  the  excitation  function  and  the  vocal  tract  transfer 
function  after  the  total  cepstrum  !s  broken  into  its 
additive  components.  However,  homomorphic  processing  was 
not  being  widely  used  for  voice  communication  and  this 
technique  was  dropped  in  favor  of  a  more  widely  used 
system.  As  array  fast  Fourier  transform  processors  become 
faster  and  less  expensive,  homomorphic  speech  processing 
may  become  the  dominant  speech  communication  techniaue. 


21 


C.   VOICE  TRACT  PARAMETER  TECHNIQUES  IN  THE  TIME  DOMAIN 

The  primary  characteristic  of  this  catagory  is  the 
close  tie  between  the  digital  process  and  the  physical 
structure  being  modeled.  Although  homomorphic  processing 
uses  the  deconvol ut i on  of  the  vocal  tract  function  and  the 
excitation  function  as  a  primary  tool,  the  homomorphic 
process  does  require  transformations  to  and  from  the 
frequency  domain  and  therefore  is  not  included  in  this 
catagory.  The  primary  member  of  this  catagory  is  the  linear 
prediction  coding  (LPC)  process  which  has  shown  itself  to 
be  among  the  best  and  most  versitile  of  the  various  speech 
processing  techniques. 
1.   The  Speech  Model 

The  speech  model  assumed  and  used  for  LPC  is  that 
of  a  time-varying  digital  filter  which  is  excited  by  a 
wide-band  function,  either  a  pulsed  input  or  random  noise. 
This  is  illustrated  in  figure  h.    The  recursive  filter  used 
to  model  the  vocal  tract  is  all-polo  and  has  slowly  time 
varying  (pseudo-stationary)  coefficients.  The  filter's 
z-domain  transfer  function  is 


Y(z)   =      p 
i=l 


a  .  ^ 
I 


or 
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Y(z)   =  U 


Cz)  +  it"  a. 

i=l 


Z  )Y(z) 


or  in  the  discrete  t i me-doma i n 


Y(nT)  =  UC 


„T,  ♦  £  , 

1=1 


a.Y((n-i )T) 


From  the  time  domain  equation  it  is  clear  that  the  current 
output  Y(nT)  is  uniauely  specified  in  terms  of  the  current 
input  and  the  past  p  output  values. 
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The  vocal  tract  is  not  always  best  modeled  by  an  all-pole 
filter,  and  particularly  nasal  sounds  would  probably  be 
best  modeled  by  a  filter  which  also  included  zeros.  However 
there  is  considerable  difficulty  in  rapidly  estimating  both 
ooles  and  zeros  of  a  transfer  function  when  only  a  short 
segment  of  the  output  is  available  for  analysis.  However, 
experience  has  shown  that  high  quality  voice  production  is 
possible  by  using  an  all-pole  filter  of  adequate  order. 

The  order  of  the  filter  required  is  closely  related 
to  the  length  of  the  vocal  tract.  To  adequately  represent 
the  lower  frequency  response  of  the  vocal  tract,  the  filter 
must  include  recursive  delay  equal  to  the  delay  encountered 
by  sound  waves  traveling  from  the  vocal  cords  to  the  lips 
and  returning  to  the  glottis. 

velocity  of  sound  =  Zkh    m/sec 
length  of  vocal  tract  =  17  cm 

2  x  0.17  =  0.988  msec 
3kk 

At  a  sampling  rate  of  10kHz  at  least  10  past  values  would 

need  to  be  included  for  an  accurate  model. 

The  excitation  function  for  voiced  sounds  in 

modeled  by  a  train  of  pulses  at  the  glottis.  Clearly  these 

pulses  can  not  be  a  perfect  set  of  impulses,  but  rather 

must  have  a  finite  width  and  are    likely  to  have  a  definite 

shape.  Rather  than  construct  a  separate  filter  to  change 

the  impulses  into  the  correct  shape,  additional  poles  are 

added  to  the  model  so  that  the  combined  transfer  function 
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may  be  calculated  at  once.  Normally  two  additions  poles  are 
adequate  for  the  pulse  shape  model. 
2 .   Linear  Predictive  Techniques 

Linear  predictive  analysis  is  based  on  the  division 
of  speech  modeling  into  modeling  of  the  excitation  function 
and  modeling  of  the  vocal  tract  transfer  function.  The 
vocal  tract  is  modeled  by  computing  each  sample  as  a 
weighted  linear  combination  of  previous  samples.  Linear 
predictive  coding  of  speech  is  accomplished  by  filtering  a 
sampled  speech  waveform  through  a  filter  which  is  the 
inverse  of  the  filter  which  models  the  vocal  tract.  If  the 
filter  used  is  the  inverse  of  a  good  model  of  the  vocal 
tract,  the  output  will  be  a  good  approximation  of  the 
excitation  function.  The  various  properties  of  the 
excitation  function,  along  with  the  coefficients  used  in 
the  vocal  tract  filter  are  measured  and  transmitted  as 
shown  in  figure  5. 
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The  received  measurements  are    used  in  the  decoding 
processor  to  reconstruct  the  excitation  function  and  the 
filter.  The  process  of  reconstructing  the  speech  waveform 
is  shown  in  figure  6. 
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FIGURE     6.        DECODING      PROCESS 

The  primary  advantage  in  the  use  of  linear 
predictive  coding  of  speech  is  the  reduction  in  the  data 
rate  required  for  transmission  or  storage.  LPC  systems  have 
been  developed  which  require  data  rates  from  3000  to  U 8 0 0 
bits  per  second  for  high  quality  voice  communication  and 
rates  as  low  as  1200  bits  per  second  have  been  reported  for 
lower  quality  but  understandable  speech  production.  Highly 
efficient  algorithms  have  been  developed  for  the  encoding 
and  decoding  of  speech  using  the  LPC  technique.  When 
hardware  implemented  with  special  purpose,  short  word 
length  microprocessors,  the  computations  reauired  for 
two-way  communication  have  been  done  in  65%  of  real  time. 

LPC  was  chosen  as  the  method  to  be  used  for 
accomplishing  the  desired  voice  characteristic 
modifications.  A  detailed  description  of  the  theory  and 
modeling  assumptions  follows. 
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IV.   LINEAR  PREDICTION  THEORY 

Linear  prediction  is  an  extension  of  least  squares 
estimation.   In  the  case  of  one-dimensional  linear 
prediction,  it  is  more  commonly  labeled  as  time  series 
analysis  when  used  by  statisticians  for  analysis  of 
everything  from  population  to  the  stock  market. 

A.   THEORY 

It  is  assumed  that  each  sample  of  the  discrete  time 
series,  s(kT),  as  shown  in  figure  7  may  be  approximated  by 
a  linear  combination  of  past  samples  of  the  time  series. 

m 
sCkT)  =\  a.  sCCk-I )T) 

i=l 
where    s(kT)     is    the    estimated    sample    value,    a.     is    the 
coefficient   of    the   sample    i    steps    past    and   m    is    the   order 
of    the    approximation    (and    as    we   will    see    later    the    order    of 
the    z-domain    filter   of    the   model). 


s(kT) 


♦    * 


r 


FIGURE     7.       DISCRETE     TIME    SERIES 
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For    a    portion   of    the    discrete    time    series    (N    samples    where 
N>m)/    a    least    squares    approximation   of    the   weighting 
coefficients,    a.,    may   be    calculated.    The    estimate    at    each 
poi  nt 


m 


s(kT)    = 


a.    s(Ck-I)T) 


1=1 


1   <   k   <  m 

is  subtracted  from  the  actual  sample  value  and  the  error 
for  each  estimate,  e(kT)  is  given. 

e(kT)  =  s(kT)  -  s(kT) 

1  1  k  1  m 

m 


e(kT)  =  s(kT)  - 


a.  s((k-i)T) 
i 


i  =1 


1  <.  k  1  m 

To  minimize  the  error  (in  a  least  squares  sense)  the  error 
is  squared  and  summed  over  all  points  in  the  region  of 
interest  to  obtain  an  overall  error,  E. 


N  N 

2 


m 
i  =1 


12 


a.  sC(k-i)T) 


e  (kT)  =  ) 
k=l        k=l 
The  derivative  of  E  with  respect  to  each  of  the 
coefficients,  a.,  is  taken  and  set  equal  to  zero  in  order 
to  locate  the  minimum  of  E.  This  yields  the  following  m 
equat i ons  . 
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m 


m 


dg  =  o   = 

6*; 


k  =  l 


2  (s(kT)-\     a.s(Ck-I)T) )d_  U(kT)-V 


s(kT)-)     a.s(Ck-i)T) 
i  =1 

1  <  j    <  m 


however 


d_    [s(kT) 


aa 


=    0 


and 


JL  fa.sC 


(k-i )T) 


da 


=   0    ,      i    *   j 

=   s(Ck-j)T)     i    =   j 


therefore 


m 


d£   =    0    =     )     2 


da. 
j 


s(kT)-  \     a.s( (k-i )T) 


k  =  l     u  i=l 


(-1)    s((k-j )T) 


1  1   j    <.  m 


removing    the    constant   multiplier 


N 


N        m 
0    =     )     S(kT)s((k-j)T)    -     ;  )      a.s((k-i)T)s((k-j )T) 


k=l  k=l      i =1 

1    (    j    (m 

changing  the  order  of  summation 

N  m      N 

s(kT)s((k-j )T)  =  \  a.  \   s ( ( k-i ) T) s ( ( k-j ) T) 

k  =  l  1*1    k  =  l 

1  1  j  <.  m 

Given  all  of  the  samples  within  the  summations  over  N, 

the  above  set  of  m  equations  in  the  m  unknowns,  a.,  can  be 

solved.  If  only  the  samples 
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s(kT)     1  1  k  1  N 
are  given,  the  set  of  equations  above  can  not  be  solved 
because  of  the  requirement  to  know  the  samples 

s(U-j)T)    1  <.  j  <.  m 
However  by  windowing  the  samples  so  that  all  samples 
outside  the  region  of  interest  are   zero 

s(kT)  =0   k  <    0  and  k  >  N 
the  summations  over  N  in  the  set  of  equations  above  may  be 
replaced  by  the  autocorrelation  of  the  windowed  samples, 
s'(kT). 

N-J 
R(j)  =  \   s'(kT)s'((k+j )T) 

k  =  l 

0  1  j  <_  m 

This    assumption   may   be  made   because    the    number    of    samples, 
N,     is    normally  much    greater    than    the   order,    m,    of    the   set 
of    equations.    Therefore    relatively   few   samples    are    lost. 
The   window    function    used   will     not    significantly    alter    the 
samples    in    the    center    of    the   frame,    and    therefore    the 
resulting   coefficients    will    be    a    correct    approximation    for 
that    segment.    The   set    of    linear    equations    may    now   be 
wr  i  tten 

m 
R(j )    =   \     a.    R(i-j) 

i  =1 

1  <j    <_  m 

These  equations  may  now  be  solved  for  the  linear  predictive 
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coefficients,  a.,  1  <.  i  <.  m. 

If  the  system  being  studied  is  stationary  or  we  are 
only  considering  a  pseudo-stationary  segment  of  the  system 
output,  and  if  the  order  of  the  model  is  sufficiently  close 
to  the  order  of  the  real  system,  future  values  of  the 
variable  may  be  calculated  recursively  from  previous 
values.  In  the  following  section  we  will  see  how  this 
theory  is  applied  to  speech  modeling  and  reconstruction. 

B.   LINEAR  PREDICTIVE  CODING  FOR  VOICE  ANALYSIS 

The  digital  model  used  for  speech  synthesis  is  shown  in 
figure  3.  The  discrete  time  excitation  function  is  e(nT) 
and  the  synthesized  speech  output  is  s(nT). 

il   I   A    A 
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FIGURE   8.       SPEECH    SYNTHESIS  MODEL 

The   vocal    tract    filter    is    assumed    to    be    all-pole    and 
therefore    can    be    represented    by    the    z-domain    equation 


H(z)    =    S(z)    = 
E(z) 


m 


m 


TT    (z-p. ) 
i=l 


Multiplying   out    the   denominator    and    dividing    both    numerator 
and    denominator    by    zm    yields. 
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H(z)  =  S(z)  = 


m 


E(z)    1-  V   a.  z~' 
1=1  ' 

This  z-domain  equation  is  converted  to  a  discrete  time 

domain  equation  as  follows 

m 

S(z)  (  l-\  a.z"1  )  =  E(z) 

1=1 


m 


S(z)  =  E(z)  +  )  a.z"'  S(z) 


i  =1 


m 
s(nT)  =  e(nT)  +  \   a.s((n-i  )T) 

i  =1 
If  the  excitation  function  e(nT)  equals  zero  for  a  given 
sample,  then  this  equation  is  similar  to  the  first  equation 
in  the  previous  section  on  the  theory  of  linear  prediction. 
The  coefficients  of  the  z-domain  filter  transfer  function 
are  equivalent  to  the  linear  prediction  wieghting 
coeff i  ci  ents . 

Analysis  of  the  sampled  speech  waveform  is  used  to 
calculate  the  prediction  coefficients  which  are  then  used 
in  an  inverse  filter  to  determine  the  excitation  function 
from  the  input  speech.  This  inverse  filter  may  be 
represented  as 
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m 


E(z) 
S(z) 


=  1  - 


a.  2 
i 


i  =1 


or  as 


m 


E(nT)  =  S(nT)  -)        a.  s((n-i )T) 

!  =1 


and  is  construted  as  shown  In  figure  9. 
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FIGURE  9. 
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The    input    speech    has    been    broken    into    vocal    tract 
characteristics    determined    by    the    prediction    coefficients 
and    excitation    signal     characteristics    which    remain    to    be 
determined.    During    the    encoding    process    the   output   of    the 
inverse    filter   may    also    be    considered    an    error    signal 
because    it    is    the   difference    between    the    actual    speech 
sample    and    the    predicted    speech    sample. 

During   voiced    speech    the    vocal     tract    filter    in    figure    9 
acts    as    a   model    for    the    total     transfer    function   which    is 
due    to    the    glottal    pulse    shape,    the    actual    vocal     tract 
shape    and    the   output    reflection    at    the    lips.     Idealy    during 
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voiced    speech    all    of    these    effects    are    removed    by    the 
inverse    filter    and    the    error    function    is    a    train   of 
impulses    at    the    pitch    frequency. 

During   unvoiced    speech    the    physical    excitation    function 
is    a    pseudo-random   air    pressure    variation    caused    by 
turbulence    at    a    constriction    somewhere    along    the    vocal 
tract.    This   wide-band    source    is    filtered    by    the    portion   of 
the    vocal     tract    between    the    constriction    and    the    lips.    This 
portion   of    the    vocal     tract   will    resonate    at    certian 
characteristic    frequencies    but    normally    the    number    of    peaks 
in    the    frequency    domain    response   will    be    fewer    than    for 
voiced    sounds    because  of    the    shorter    segment    of    the    vocal 
tract    in   use.    During   encoding  of    unvoiced    speech    the   output 
of    the    inverse    filter    is    pseudo-random   because    the    inverse 
filter    can't    predict    the   output    due    to    the    random    input. 

The    speech   model     is    not    complete   with    just    the 
determination   of    the    coefficients    of    the    vocal     tract 
filter.    During   speech    reconstruction    it    is    necessary    to 
know: 

(1)  Which    excitation    signal,     pulses    or    noise,     to 
use. 

(2)  Excitation    pulse    period    for    voiced    sounds. 

(3)  The    gain   multiplication    factor. 

Although    these   quantities    are    not    necessarily   determined 
using    linear    prediction    theory,     they    are    none    the    less 
required    for    a   working   speech    encoding/decoding   system. 
During    encoding,     the   marked    difference    in    the    error 
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signal    for    voiced    and    unvoiced    speech    can    be   used    as    the 
basis    for    the    voiced/unvoiced    decision.    The    energy   of    the 
error    signal    for    voiced    speech    should    be    rather    small     in 
comparison    to    the    energy   of    the    input    samples.    On    the   other 
hand,    during   unvoiced    speech    the    prediction    is    poor    and 
most   of    the    energy    remains    after    filtering.    The    ratio   of 
the   average    energy   or    root-mean-square    value   of    the   speech 
samples    to    the    similar    quantity   of    the    error    signal    can    be 
used    to    make    the    voiced/unvoiced    decission.    This    ratio    is 
compared    to    an    empirically   determined    threshold    and    the 
segment    is    considered    voiced   whenever    the    ratio    is    greater 
than    the    threshold. 

The    gain    used    during    reconstruction    is    the    amplitude 
multiplier    of    the    excitation    signal    at    the    input   of    the 
vocal     tract    filter.    The    gain    used    during   unvoiced    speech 
may   be   simply    the    root-mean-square   of    the   error    signal. 
This    gain    coefficient    is    multiplied    by    the   output   of    a 
random    number    generator   which    produces    normally    distributed 
numbers    with   a    root-mean-square   value   of    unity. 

The   gain   of    voiced    speech   may    also    be   determined    from 
the    root-mean-square    value   of    the    error    signal.    However 
during    reconstruction   of    voiced    speech    the    entire    energy   of 
the    excitation    signal     is    concentrated    in    a    series    of 
impulses    which    should    have    the    same    root-mean-square    value. 
The      root-mean-square    value   of    a    series    of    d i screte- t i me 
impulses    with    amplitude,    a,    and    a    period,    p,     intervals    is 
approximated    by 
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The  output  of  a  unit  impulse  generator  should  then  be 
mu 1 1  i  pi i  ed  by 

1/2 
G  =  rms  p 

to  insure  that  the  same  energy  is  input  to  the  vocal  tract 

filter  as  was  output  by  the  filter  during  encoding.  The 

above  method  for  calculating  the  gain  needed  during 

reconstruction  is  based  on  the  assumption  that  the 

prediction  error  for  voiced  speech  is  caused  entirely  by 

the  physical  excitation  function  of  the  speaker.  However 

the  prediction  error  may  be  increased  because  the  vocal 

tract  was  changing  shape  rapidly  during  the  analysis  frame 

or  because  of  background  noise  at  the  microphone  which 

would  not  be  removed  by  the  inverse  filter.  Either  of  these 

would  cause  an  unwanted  gain  increase  during 

reconstruction.  A  typical  voiced  speech  waveform  and  the 

error  signal  generated  from  it  are  shown  in  figure  10. 
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(A)      VOICED    SPEECH 
WAVEFORM 


-"y^^^T^^^^ 


(B)        ERROR    SIGNAL 
WAVEFORM 


FIGURE     10  . 

The  reliable  determination  of  the  pitch  period  of 
voiced  speech  is  a  problem  for  which  the  ideal  solution  is 
still  undetermined.  The  periodic  increase  in  the  amplitude 
of  the  error  signal  at  the  pitch  period  is  shown  in  figure 
10(b)  and  suggests  the  use  of  the  error    signal  in  pitch 
period  determination.  A  number  of  algorithms  exist  for 
determination  of  the  pitch  period  which  generally  involve 
various  combinations  of  the  following  processes. 

(1)  Raising  the  error  signal  to  a  given  power. 

(2)  Low-pass  filtering  of  the  error  signal. 

(3)  Windowing  the  error  signal. 

(k)    Calculating  the  autocorrelation  function  of  the 
filtered  error  signal. 

(5)  Picking  the  peaks  of  the  autocorrelaticn 
f unct i  on. 

Experience  has  shown  that  pitch  determination  is 

computationally  as  difficult  as  the  LPC  parameter 
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determination    and    the    literature   on    the   subject    illustratas 
the    trade-off    between    hardware,    software,    computation    time 
and    reliability   from  method    to   method. 

C.       LPC    COMMUNICATION    SYSTEMS 

A    review   of    existing    LPC    communication    hardware    is 
useful    because   any  method  which    alters    formant    and    pitch 
characteristics    of    speech   will    be   most    successful     if    it    is 
com pa  table   with    these    systems. 

Currently   off-the-shelf   microprocessors    are    not    fast 
enough    to    handle    the   algorithms    described    in    real-time. 
However    special     purpose   units    which    are   designed    along 
computer    lines,    do   meet    the    real-time    criteria.    On    the 
surface    the   word    'computer'    might    not    seem    to    fit    these 
special    purpose  machines,    but    a    closer    look   will    reveal 
that    each    has    components    which    are    the    same    as    those   of    a 
computer:    stored    programming,    memory,     input,    output,    an 
arithmetic    logic   unit    (ALU),    an    instruction    set,    and 
control    components.    Two    processors    which    were    developed    at 
MIT's    Lincoln    Laboratory   will    be   used    to    illustrate    the 
state   of    the    art    in    LPC    voice    terminals    and    certain 
similarities    in    their    architecture   will    be    evident.    The 
first    processor    is    the  more   flexible   of    the    two    and    is 
designed    to    handle    a   wider    varity   of    algorithms.    The    second 
was    developed    about    a    year    later    and   was    designed 
specifically   for    LPC    algorithms    with   only   minor    changes. 

The   first    processor    to    be    covered    is    the    Lincoln 
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Digital    Voice   Terminal    (LCVT)    which   was    designed    and 
constructed    at    the    Lincoln    Laboratory    during    the    1973-73 
time   frame.    This    processor    is    capable   of    carrying  out    18 
million    basic    instructions    per    second   with    a    16-bit    by 
15-bit   multiplication    taking   four    times    as    long.    The 
execution    time    for    each    instruction    is    165    nsec.    which 
seems    to    conflict   with    the    instruction    rate.    This    is 
resolved    by    the    pipelining  of    the    three    portions    of    each 
basic    Instruction:    fetch,    decode,    and    execute.    The 
processor    has    separate  memories    for    data    and    the    program. 
The   data   memory    capacity    is    512    16-bit   words    and    the 
program  memory    contains    102U    16-bit    instructions.    The 
pipeline    instruction    processing    requires    that    the    buses    to 
and      from    the   ALU    be   seperate    and    each    is    unidirectional. 

Figure    11    shows    the   data    paths    of    the    LDVT    (none   of    the 
control    or    timing    lines    are    shown).    There    are   four    active 
registers:    the    P    register   which    is    the    program    counter   with 
multiplexed    inputs    from    the   address    portion   of    the 
instruction,    the   ALU,    the    sum  of    the    X    register    and    the 
address    portion   of    the    instruction,    and    itself    incremented 
by   one;    the    X    register   which    is    used    for    indexing  memory 
addresses;    the   A    register   which    is    the   accumulator;    and    the 
3    register   which    is    actually   a    pair   of    registers      used    for 
i  nput    and    output. 
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FIGURE    11 


LDVT     DATA        FLOW 


The   ALU    of    the    LDVT    as    shown    separately    in    figure    12, 
has    two    sections:    a    standard    programmable    ALU    which 
performs    logical,    addition    anc    compare   operations;    and    a 
16-bit    by    16-bit   multiplier    array   which    provides    a    52-bit 
result    in   just    k    cycles.    Either    of    these   may   be   used   with 
any    input,    however    due    to    their    common    input    and    output 
only   one  may   be   used    at    a    time. 

It    is    significant    to    note    some   of    the    requirements 
brought   on    by    the    pipelining   of    the    instructions.    The 
device    does    not    have    a   main    bus    over   which    data    flows    in 
both    directions.    Generally    all    data    flow    is    unidirectional 
and    In    the    case   of    the   ALU    input    buffer    registers    are 
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needed    to    hold    the   data    for    the    instruction    being   executed 
while    the    next    instruction  may   have    already    read    a    value 
from  memory    and    put    this    on    the    ALU    Input    line.     In    addition 
to    LPC    algorithms    at    2400,    3603    and    4800    bits    per    second, 
the    LDVT   has    been    programmed    for    adaptive    predictive    coding 
at    3000    bits    per    second    and    as    a    channel    vocoder    at    2400 
bi  ts    per    second. 


Prosrar.mable 
ALU 


Multiplier 
Array 


Multiplier 


Multiplicand 


VOR 


Control 


source 
Register 


Memory 


FIGURE    12.     LDVT    ALU 

The    second    speech    processor    is    the    Linear    Predictive 
Coding  Microprocessor    (LPCM)    which    is    di signed    strictly   as 
a    low    cost    LPC    terminal.    The   basic    cycle    time   for    this 
machine    is    150    nsec.    The    data   memory    has    2K    16-bit   words    of 
which    1.5K    is    ROM    and    0.5K    is    RAM.    The    program   memory 
contains    IK   of    48-bit   words.    The    LPCM    is    almost    free   of 
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instruction   decoding,    with    the   only    exception    being    the    ALU 
operation.    Figure    13    shows    the      instruction    format    and    in 
figure    Ik    it    is    evident    that    parts    of    the    instruction 
register    are   being    input    as    control    functions.    Figure    15    is 
a    block    diagram   of    the    LPCM    and    shows    the    two    buses    and    the 
large    number   of    registers    needed    to    control    the   data    flow. 

While    these  machines    have    varying   degrees    of 
adaptability/     it    does    not    appear    that    either    could    handle 
the   additional    computations    described    in    the    following 
sections    without   major    hardware   modifications.    However,    a 
special     purpose    LPC    code    converter    which    could    oe    used    in 
conjunction   with    an    existing    terminal    could    probably   be 
developed   which   would   operate    in    real-time   and    not    load    the 
existing   processor. 
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V.   ADJUSTMENT  OF  VOCAL  TRACT  PARAMETERS  USING  LPC 

One  reference  to  voice  characteristic  modification  was 
found  by  the  author  rAtal  and  Hauneur,  197l] .   Although 
scaling  of  pitch,,  formant  frequency  and  formant  bandwidth 
was  stated  to  have  been  accomplished,  no  description  of  the 
work  was  given.   Other  literature  did  provide  useful 
information  on  formant  frequencies  and  pitch  periods  which 
are  typical  for  various  speakers.  It  should  be  noted  that 
there  is  a  considerably  larger  variation,  from  speaker  to 
speaker,  in  pitch  period  than  in  formant  frequencies.  As  an 
example,  two  speakers,  saying  the  same  phoneme  could  easily 
have  pitch  periods  that  varied  by  a  factor  of  two,  yet  have 
only  a  10-20  per  cent  variation  in  formant  frequencies. 
Different  physical  structure  (vocal  cords  and  the  vocal 
tract)  produce  these  speech  characteristics  (pitch  period 
and  formant  frequencies,  respectively)  and  therefore  their 
variation  from  speaker  to  speaker  is  only  partially 
correl ated . 

The  coded  information  produced  from  input  voice  by  the 
LPC  processor  is  very  closely  related  to  the  physical 
structure  that  is  producing  the  sound.  On  output,  speech  is 
reconstructed  from  the  gain,  pitch  period  and 
voice/ unvoiced  parameters  as  well  as  the  vocal  tract 
prediction  coefficients.  The  gain  and  pitch  period  can  be 
varied  as  they  stand  but  the  variation  of  the  prediction 
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coefficients  is  somewhat  more  complicated.  The  goal  o^ 
varying  these  coefficients  before  reconstruction  is  to  have 
the  output  voice  have  different  pitch  period  and  formant 
frequencies  while  retaining  a  natural  sound  and  retaining 
the  same  information/  i.e.  the  same  sequence  of  phonemes 
and  voice  inflection. 

Voice  characteristics  are  associated  with  certain 
parameters  of  the  LPC  code.   First,  formant  frequencies  and 
bandwidths  are  associated  with  the  LPC  coefficients.  The 
amplitude  of  the  output  voice  is  associated  with  both  the 
gain  coefficient  and  the  formant  bandwidths.   The 
relationship  between  output  amplitude  and  the  formant 
bandwidth  is  due  to  the  increased  energy  in  the  impulse 
response  of  a  narrow  bandwidth  (high  Q)  transfer  function. 
This  is  noted  physically  by  the  fact  that  speakers  with 
highly  resonant  voices  may  speak  louder  for  the  same  amount 
of  energy  expended.  The  pitch  period  is  controled  by  the 
pitch  period  coefficient  only.   Finally,  the  voice/unvoiced 
decission  would  normally  not  be  changed.   The  exception 
would  be  if  one  was  reconstructing  whispered  speech  (the 
vocal  cords  are    stationary)  from  normal  speech. 

A.   ADJUSTMENT  OF  FORMANT  FREQUENCY  AND  BANDWIDTH 

The  vocal  tract  model  we  are    using  has  all  real 
coefficients  in  the  z-domain  polynomial.  Following  directly 
from  this  is  the  fact  that  all  poles  must  fall  either  on 
the  real  axis  of  the  z-plane  or  in  complex  conjugate  pairs. 
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Each  of  the  complex  conjugate  pairs  is  associated  with  one 

formant  (resonator)  of  the  speech  model.  The  vocal  tract 

transfer  function  is  the  product  of  these  resonator 

transfer  functions  which  are  each  of  the  following  form 

1 

H   (z)  =     -2TTCBW)  Ts  -1   -<+Tf(BW)  T5  -2 

f       l-2e         cos(2TT  F  T$   )  z  +  e         z 

where  F   is  the  center  frequency  of  the  formant,  f  ,  and  BW 

is  the  bandwidth   of  the  formant.  The  pole  locations 

associated  with  this  transfer  function  are 

z  =  x  +  j  y 

This  pair  of  poles  must  be  moved  in  order  to  alter  the 

frequency  and  bandwidth  of  this  resonant  section  of  the 

vocal  tract  model,  but  this  must  be  done  carefully  so  that 

the  poles  remain  inside  the  z-plane  unit  circle.  If  the 

desired  modification  of  the  input  speech  is  to  reduce  the 

bandwidth  (increase  Q)  of  the  formants,  the  poles  must  be 

moved  closer  to  the  unit  circle.  If  the  distance  from  the 

center  is  multiplied  by  a  constant  factor,  there  is  a 

danger  of  moving  poles  outside  the  unit  circle  and  thereby 

causing  instability  during  reconstruction.  However,  the 

magnitude  of  the  pole  is  always  less  than  one  and  may  be 

raised  to  any  positive  power  without  danger  of  crossing  the 

unit  circle.  It  is  shown  as  follows  that  raising  the 

magnitude  to  a  factor  is  equivalent  to  multiplying  the 

formant  bandwidth  by  that  same  factor. 

The  transfer  function  with  the  complex  conjugate  poles 

above  i  s : 
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H(z)  =      -1      2   2-2 
l-2x  2   +  (x  +y  )  z 

However  with  the  pole   locations  in  polar  form 

x  =  A  cos  9     Y  =  A  si  n  9 

and  making  use  of 

2     2 
cos  9+si  n  9  =  1 

the  eauations  becomes 

1 

H'(z)  = 


-1   2   -2. 

1-2A  cosS  z  +A    z 


Setting  the  terms  of  the  characteristic  equations  equal  we 
get 

-2TT  CBW)  T5 


2A  cos9   =  2e 


co  s  (  2  TT  F  Ts  ) 


and 


2     -i+TT  (BW)   T, 


A   =  e 

when  solved  for  A   and  9   give 

-2TT  (BW) 
A  "  =  e 


s 


S 


9   =  2TT  F  Ts 

and  i  nversel  y 

F   =9   /  2TT  Ts 

BW-  =  (-In  A  )  /  2TT  Ts 

If  new  formant  characteristics,  F'  and  BW',  are  desired 

where 

F'  =  /F 

and 
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BW1     =    C<BW 
they  may   be    implemented    by   moving    the    poles    of    the 
characteristic   equation    so    that 

9   •    =     /8 


and 

wh  i  ch    reduced    to 


In  A  *  =  Q(ln  A 


a 


A  »  =  A 

This  method  of  implementing  the  pole  shifts  guarantees 
that  no  unstable  poles  will  be  created  and  is  used  in  the 
following  section  in  the  realization  of  a  LPC  voice 
modification  system. 


3.   GAI N  ADJUSTMENT 

The  filter  coefficients  reconstructed  from  the 
relocated  poles  above  may  not  have  the  same  zero  frequency 
gain  characteristic  as  the  filter  used  for  inverse 
filtering  during  encoding.  This  situation  can  be 
illustrated  graphically  by  the  two  vocal  tract  transmission 
characteristics  shown  in  figure  16. 
Q(f)  G(f) 


BEFORE    PROCESSING  AFTER      PROCESSING 

(A)  (B) 

FIGURE     16.        FORMANT      GAIN 
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Although    the   formant    frequencies    in    15(b)    are    lower    than 

the    corresponding   frequencies    in    16(a)    as    was    desired,     the 

overall    gain  was    also    changed.    This   would    cause    the 

reconstructed    speech    no    be  much    softer    than    desired. 

A   solution    to    this    problem   was    to    adjust    the    excitation 

function    gain   used    during    reconstruction.    This    adjustment 

factor    would    be    equal     to    the    ratio   of    the   zero    frequency 

gains    of    the   original    and   modified    vocal     tract    filters.    The 

vocal     tract    has    the   following    z-domain    transfer    function. 

1 
H(z)    =  p 

1+Ta.z  "' 

i  =1 

The    above    equation    can    be    evaluated    at 

jTT    f/fs 
z    =    e 

to  obtain  the  gain  at  frequency  f.  Evaluating  the  above 

transfer  function  at  f=0  yields  the  following  equations. 

z"'  =  1 

and 


G(Q)  =     P 

!♦£., 

i=l 

This  equation  can  be  easily  evaluated  for  both  the 

coefficients  of  the  vocal  tract  transfer  function 

calculated  from  the  input  sequence  and  the  coefficients 

calculated  from  the  altered  pole  locations.  The  gain 

multiplication  factor  is  then  multiplied  by  the  energy 
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measured  in  the  error  signal  to  get  the  excitation  gain  to 
be  used  during  reconstruction. 

C.   PITCH  PERIOD  ADJUSTMENT 

The  adjustment  of  the  measured  pitch  period  may  almost 
go  without  explanation  except  to  note  that  if  the  pitch 
period  is  increased  and  all  other  coefficients  remain 
unchanged,  the  output  speech  would  be  softer.   This  is  due 
to  the  reduced  energy  (impulses  less  often)  being  input  to 
the  vocal  tract  filter  and  the  resulting  lower  energy  in 
the  output  speech. 
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VI.  COMPUTER  SIMULATION  OF  PITCH  AND  FORMANT  MODIFICATION 

The  process  of  pitch  and  formant  modification  was 
carried  out  on  the  IBM  360  computer  with  the  input  and 
output  being  accomplished  on  a  hybrid  system  consisting  of 
a  COMCOR  5000  analog  comDuter  and  an  XDS  9300  digital 
computer.  The  interface  between  the  XDS  9300  and  the  I 3M 
360  was  seven  track  digital  magnetic  tape.   All  work  was 
done  on  five  second  segments  to  allow  sufficient  length  for 
analysis  while  not  using  excessive  computer  processing 
t  ime . 

A.   VOICE  INPUT  AND  DIGITAL  SAMPLING 

The  input  voice  was  recorded  on  a  standard  single  tract 
audio  tape  recorder  at  7  1/2  inches  per  second  (ips). 
Recording  was  done  with  a  high  quality  microphone  in  a 
quiet  but  not  sound-proof  room.  This  digitizing  was  done  at 
half  speed  tc  allow  the  digital  computer  to  write  the  data 
onto  tape  without  missing  any  data.  This  recording  was 
played  back  at  3  3/ k    ins  with  the  output  directed  to  an 
amplifier  of  the  analog  computer.   The  voice  was  amplified 
to  a  level  aoprooriate  for  the  analog  computer  (a  +.100  volt 
machine).  The  amolifier  output  was  passed  through  two 
forth-order  analog  filters  set  at  2350  Hz  and  2 i4- 0 0  Hz  cut 
off  freauencies.   The  output  of  the  filters  was  then  put 
into  a  samole  and  hold  circuit  at  the  input  of  a  1 U  — b  i  t 
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analog  to  digital  converter.   The  Ik    bits  produced  were 
read  by  the  XDS  9300  and  placed  in  the  most  significant 
bits  of  the  2k    bit  XDS  9300  computer  word.   This  process  is 
illustrated  in  figure  17. 


ANALOG     TO 
DIGITAL 


0 


24    BIT     WORD 


FIGURE    17.      DATA   ACQUISITION 


The    samol  ing    rate    used   was    5000    Hz.       However    the    voice 
recording    was    played    back   at    half    speed   and    therefore    the 
equivalent      lowDass    filter   cut    off   and    the    equivalent 
sampling    rate    were   about    4-750    and    10,000    Hz    respectively. 

B.       XDS    93  00    OPERATION 

The    operation    of    the    XDS    9300    during    the    input    phase 
was    simply    to    read    the    data    available    at    the    output   of    the 
analog    to    digital    converter   and    place    this    data    in    an 
array.    When    an    array   of   1024-    samples   was    filled    it    was 
written    onto   a    seven    track    magnetic    tape.    This    was    done 
continuously    so    that    no    data    was    lost    between    blocks.       The 
voice    segment    as    it    existed    on    the    seven    track    tape 
consisted    of    50    blocks   of   1024    samples.    Each    sample    was 
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recorded  In  a  integer  format  ranging  from  +8388607  to 
-8388607  (±(2**23 >-l ) .  This  tape  was  then  used  as  the  input 
to  the  IBM  360. 

C.   IBM  360  IMPUT  PREPARATION 

When  the  24-bit  word,  seven  track  tape  created  by  the 
XDS  9300  was  read  by  the  IBM  360,  the  machine 
representation  of  the  values  was  not  correct.  This  was  due 
to  the  addition  of  the  eight  bits  shown  in  figure  18. 

24-Bit  XDS  9300  Word 


32-Bit  Word  Read  by 
IBM  360 


Corrected  IBM  360  Word 

FIGURE  18. 

The    data    conversion    program    (Appendix    A.l)    was    used    to    read 
the    data    from   the    seven    track    tape   and   move    the    bits    of 
each    value    as    required.       The    program    did    not    make    the 
conversion    from    ones    complement     representation     (XDS    9300) 
to    twos    complement    reo resentat i on     (IBM   360)    because   any 
error    caused    would    be    well     below   the    14-bit    quantization 
error.       At    this    point    the    data    was    converted    to    floating 
point    representation    with    values    between    ±10  0.0    and    the 
average    value    of   each    sequence    was    calculated   and 
subtracted    from   each    data    point.      This    insured    that    the 
input      was    a    zero   mean    function.       Each    data    sequence    was 
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written     into   a    separate    file    of   a    standard    nine    track    IBM 
360    tape    for    ease    of    further    hand! ing. 

D.       SCOPE    OF    SIMULATION    PROGRAM 

The    goal    of    this    research    was    to    demonstrate    the 
feasibility   of   voice   modification    and   as    a    result    only 
certain    areas   were    studied.       Specifically,    all    programming 
was    done    with    the    standard    IBM    360    floating-point 
arithmetic,    making   no   allowance    for    the    effects    which    would 
be    caused    by    the    shorter    word    length    and    integer 
representation    used    in    most    voice    processing    systems. 
Further    study   of    that    area     is    warranted    and    would    be 
especially   critical     in    the    determination    of    the    pole 
location,    which    is    covered    later. 

The    system   degradation    by    background    noise    in    the    input 
speech    was    not    studied   except    to    note    that    the 
vo iced/ unvo i ced    deciion    threshold    would    need    to    be    adjusted 
for    a    noise    environment. 

Although    the    programs    were    written    to    allow   variation 
in    the    order   of    the    prediction,    number    of    samples    per    frame 
and    sampl  fng    interval,    these    were    not    varied.       A    12th    order 
voice    tract    filter   was    used    throughout    and    proved    to    be 
satisfactory.       The    analysis    frame    length    was    25.6   msec. 
(256    samoles)    and   also    remained    unchanged.       In    any    future 
use    of    these    programs    with    a    different    frame    length, 
attention    would    be    required    by    the     input    format    to    insure 
that    the   analysis    frame    length    is   an    integral    multiole    of 
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the  input  record  length. 

Finally,  in  the  following  description  of  the  programs 
the  term  ' LPC  coefficients'  will  refer  to  the  coefficients 
of  the  vocal  tract  model  filter.   The  term  'LPC  parameters' 
will  refer  to  the  entire  set  of  parameters  needed  to 
reconstruct  the  output  speech,  i.e.  the  LPC  parameters 
consist  of  the  LPC  coefficients,  the  gain  parameter,  the 
pitch  period  and  the  voicing  indicator. 

E.   LPC  ENCODING 

The  first  step  of  the  encoding  process  was  to  determine 
the  filter  coefficients.   These  coefficients  were  used  in 
the  inverse  filter  for  determination  of  the  error  signal. 
The  root  mean  square  values  of  the  input  and  error  signals 
were  compared  to  determine  if  the  frame  was  voiced  or 
unvoiced.  Finally  the  pitch  period  was  determined  for 
voiced  frames.  This  program  is  listed  in  Appendix  A. 2. 

I.   LPC  Coefficient  Determination 

Determination  of  the  LPC  coefficients  was  done  with 
the  autocorrelation  method  in  the  subroutine  named  AUTO. 
First,  the  input  data,  s(n),  was  windowed  by  one  of  four 
available  windows  producing  a  temporary  array,  t(n),  of  the 
windowed  data. 

t(n)  =  W(n)  x  s(n) 
The  discrete  autocorrelation  of  the  temporary  array  was 
calculated  for  the  discrete  displacements  of  zero  to  the 
predictor  order,  p. 
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N-j 

R(j)  =  \  t( i)  t( i+j  ) 

i=l 

0  <    j  <    P 

The  next  step  was  the  solution  of  the  following  matrix 

equat  ion . 


R( I i-j  I)  a.  =  R( i) 
j 


j-1 


i  1  i  <.  p 

The  auto  correlation  matrix  in  always  positive  definate, 

symetric  and  all  values  along  a  given  diagonal  are    equal. 

A  particularly  efficient  method  of  solution  is  available. 

This  method  is  attributed  to  Durbin  IMakhoul,  19751  and  is 

implemented  in  subroutine  COEFF.   Durbin's  algorithm  is 

recursive  and  calculates  the  predictor  coefficients  for  the 

Kth  order  from  the  coefficients  for  the  (k-l)th  order.   The 

jth  coefficient  for  the  kth  order  predictor  is  a.(k).  The 

j 

recursion  formulas  follow. 
ECO)  =  RCO) 


a.(k)  = 
J 


j-l 
R(j)  -  )  a. (j-l)  R(j-i) 


i=l 


/E(k-l) 


1  <.  J  1  P 


a.(k)  =  a.(k-l)  -  a,  (k)  a   .(k-1) 
j      j        k    k-j 


UJ  i  (k-l) 
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E(k)  =  (1-a^Ck)  )  E(k-l) 

E(k)  is  the  prediction  order  error  resulting  from  limiting 
the  predictor  order  to  k. 

During  the  programming  of  COEFF  the  subroutine  TEST 
was  written  to  perform  and  print  the  results  of  the  matrix 
multiplication.   During  the  initial  testing  of  the  program 
various  window  functions  were  used  in  AUTO,  however  the 
prediction  order  error  did  not  change  significantly  with 
the  window  function  used. 

Certain  researchers  have  noted  that  a  lower  order 
filter  may  be  used  during  unvoiced  speech.   If  this  is 
desired,  the  coefficients  for  the  lower  order  filters  could 
be  stored  during  the  recursive  steps  of  the  algorithm  above 
and  later,  when  the  frame  is  determined  to  be  unvoiced,  the 
lower  order  filter  coefficients  would  be  available  without 
further  calculation. 

The  coefficients,  a.,  used  in  the  main  program  are 
the  coefficients  of  the  characteristic  polynomial  of  the 
filter  with  a   assumed  to  be  unity. 

1 

H(z)  =    p 

i 


i=0 


a.  z 
i 


Therefore  the  negitive  of  the  values  calculated  in  COEFF 
were  returned  to  the  main  program. 
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2 .   Error  Signal  Determination 

The  error  signal,  e(n),  is  determined  by 
subtracting  the  predicted  sample  value,  *s(n)  from  the 
actual  val ue,  s( n)  . 

e(n)  =  s(n)  -  s(n) 
P 
s(n)  =  -  Z_,  a.  s(n- 


i) 


i=l 


e(n)  =  s( 


n)  +  2_,  a-  s^n- 


i) 


?=1 


This  operation  is  carried  out  by  subroutine  ERR.   In  order 
to  make  a  correct  error  determination  at  the  begining  of 
each  frame,  a  number  of  samples  equal  to  the  order  of  the 
predictor  were  saved  from  the  end  of  the  previous  frame. 
This  eliminated  additional  error  signal  energy  caused  by 
poor  begining  of  frame  prediction  and  reduced  the 
possibility  of  an  incorrect  voicing  decision.   Another 
possible  solution  to  this  problem  would  be  just  not 
analyzing  the  error  for  the  first  few  samples  of  each  frame 
and  making  the  appropriate  changes  in  the  following 
routines  that  use  the  error  signal. 
3 .   Voi  c  i  ng  Dec  i s  ion 

A  comparison  of  input  signal  energy  and  the  error 
signal  energy  was  used  to  determine  if  a  particular  frame 
is  voiced  or  unvoiced.   Although  the  root  mean  square  value 
of  each  set  of  data  is  actually  proportional  to  the  square 
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root  of  the  energy  in  the  signal,  the  root  mean  square 
value  was  used  in  this  comparison.   Whenever  the  root  mean 
square  value  of  the  input  signal  divided  by  the  root  mean 
square  value  of  the  error  signal  was  greater  than  a 
threshold  value,  the  frame  was  determined  to  be  voiced  and 
the  voicing  indicator  was  set  to  one.  Otherwise  the  voicing 
indicator  was  set  to  zero. 

4-.   Pitch  Period  Determination 

The  error  signal  was  used  in  subroutine  PITCH  for 
determination  of  the  pitch  period  of  each  voiced  frame. 
First  the  error  signal  was  passed  through  a  recursive  5th 
order  Butterworth  filter  with  an  800Hz  cut  off,  to  smooth 
the  signal.   Extra  samples  of  the  error  signal  and  filtered 
error  signal  were  saved  from  frame  to  frame  (zeroed  during 
unvoiced  frames)  to  insure  a  correct  filtered  error  signal 
at  the  begining  of  each  frame.   The  degradation  of  the 
system  if  this  was  not  done  was  negligible  but  plots  of  the 
filtered  error  signal  would  have  shown  discontinuities  at 
the  begining  of  each  frame  if  this  had  not  been  done.  The 
frame  was  windowed  to  eliminate  end  effects  and  the 
autocorrelation  function  of  the  filtered  error  signal  is 
calculated.   The  portion  of  the  autocorrelation  function 
from  12  to  180  samples  was  searched  for  peak  values  and  the 
pitch  period  set  equal  to  the  location  of  this  peak. 
Figure  19  shows  a  typical  autocorrelation  function  and  the 
portion  of  the  curve  searched  for  the  peak  value.  The  peak 
picking  algorithm  checked  to  insure  that  the  value  chosen 
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was  not  on  the  downslope  of  the  center  peak  and  was  not  a 
minor  peak  with  a  larger  peak  at  a  longer  pitch  period. 


Ree<hT) 


REGION    SEARCHED 


FI  GURE     19. 

Although  this  pitch  determination  algorithm  worked 
satisfactorily  in  this  program  it  is  probably  not  as 
accurate  and  flexible  as  certain  other,  more  complicated 
techniques  available.   It  was  used  only  for  pitch  periods 
from  about  3  to  9   msec,  but  was  satisfactory  for  them. 

F.   LPC  PARAMETER  MODIFICATION 

The  purpose  of  the  program  was  to  demonstrate  the 
modification  of  voice  characteristics.   The  system  was 
designed  so  that  only  the  LPC  parameters  were  needed  to 
make  the  desired  modifications.   No  other  measurements  of 
the  input  speech  are    needed.   Of  the  parameters  calculated 
from  the  input  speech,  only  the  voicing  indicator  remained 
unchanged.   The  LPC  coefficients  are  varied  as  required  by 
the  desired  formant  frequency  and  bandwidth  changes 
require.  The  pitch  period  is  varied  separately  and  the  gain 
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is  adjusted  to  correct  for  changes  caused  by  formant 
bandwidth  modification. 

1.   LPC  Coefficient  Modification 

The  modification  of  the  LPC  coefficients  is 
accomplished  by  three  subroutines:  POLES,  ALT,  and  NEWCF. 
Subroutine  POLES  calculates  the  z-plane  pole  locations  from 
the  LPC  coefficients.   Subroutine  ALT  changes  the  locations 
of  the  poles  according  to  the  various  scale  factors 
specified  by  the  main  program.   The  new  predictor 
coefficients  are  calculated  by  subroutine  NEWCF. 

The  predictor  coefficients,  a.,  are    provided  to 
subroutine  POLES  to  get  the  p  order  z-domain  polynomial 
which  is  factored  into  its  component  roots,  the  z-plane 
poles  of  the  vocal  tract  filter.   This  factorization  is 
done  with  library  routine  ZRPOLY  which  was  sufficiently 
accurate  and  produced  complex  conjugate  pairs  which  were 
exact  complex  conjugates.   This  simplified  the  problem 
which  came  up  later,  of  separating  the  real  poles  and  the 
complex  conjugate  pairs  so  that  the  proper  scaling  factor 
could  be  applied  to  each.   The  input  polynomial  had  all 
real  coefficients  and  therefore  all  the  roots  are    real  of 
in  complex  conjugate  pairs.   These  poles  are    placed  in  a 
complex  array  and  returned  to  the  main  program. 

The  subroutine  ALT  was  provided  with  the  complex 
array  of  pole  locations  and  it  separated  them  into  separate 
arrays  of  real  and  complex  poles.  Each  complex  conjugate 
pole  pair  was  entered  as  one  entry  in  the  complex  pole 
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array.   The  scaling  factors  provided  to  subroutine  ALT 
consisted  of: 

(1)  FSC  -  Formant  frequency  scaling  factor 

(2)  BSC  -  Formant  bandwidth  scaling  factor 

(3)  RSC  -  Real  pole  scaling  factor 
(it)  RLIM  -  Real  pole  magnitude  limit 
(5)  SP  -  Sampling  period 

The  polar  coordinates  were  determined  for  each  pair 
of  complex  conjugate  poles  and  the  magnitude,  A,  and  angle, 
8,  of  each  were  considered  separately.   The  magnitude  was 
raised  to  the  power  of  the  bandwidth  scale  factor  and  the 
angle  was  multiplied  by  the  frequency  scale  factor. 

BSC 

A1  =  A 

9'  =  9  x  FSC 
The  modified  magnitude,  A',  and  angle,  9',  were  used  to 
determine  the  complex  location  and  the  calculated  pole  and 
its  conjugate  were  put  in  the  pole  vector  for  output. 
During  the  alteration  process  each  complex  pair  of  poles 
was  checked  against  a  constant  magnitude  of  0.98  to  insure 
that  numerical  instability  or  repeated  impulses  would  not 
cause  excessively  large  outputs. 

Each  real  pole  was  multiplied  by  the  real  pole 
scale  factor  and  checked  to  insure  that  the  magnitude  was 
less  than  the  limit  prescribed.  The  effects  of  varying  the 
real  poles  was  not  studied  and  a  real  pole  limit  of  0.95 
proved  to  guarantee  sufficient  damping  of  the  output  to 
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provide  a  nearly  zero  mean  output. 

The  poles  from  both  the  real  and  complex  pole 
arrays  were  combined  into  one  array  for  return  to  the  main 
program.   Subroutine  ALT  also  provided  graphical  and 
printed  output  of  the  pole  locations,  before  and  after 
modification  when  this  was  desired.  Figure  20  is  an  example 
of  the  graphical  output  which  shows  the  z-plane  pole 
locations  before  and  after  modification,  in  relation  to  the 
un  i  t  ci  rcl e . 


FIGURE  20.  VOCAL  TRACT  POLES 
X   INPUT 
+  AFTER  MODIFICATION 

Subroutine  NEWCF  performed  the  task  of  multiplying 
the  poles  to  calculate  the  coefficients  of  the  modified 
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characteristic  equation  for  the  vocal  tract  filter.   This 
operation  was  done  in  double  precision  arithmetic  because 
the  predictor  coefficients  being  calculated  often  differed 
by  only  small  amounts.   This  process  would  require  close 
study  before  this  system  could  be  implemented  on  a  short 
word  length  processor. 

2.   Pitch  Period  Modification 

The  pitch  period  was  modified  in  the  main  program 
and  consisted  only  of  converting  the  pitch  period  (an 
integer)  to  floating  point  representation,  multiplying  by 
the  pitch  period  scale  factor,  and  reconverting  to  fixed 
point  representation.  Although  changing  the  pitch  period  is 
relatively  simple,  a  number  of  other  changes  are    caused  by 
modifying  the  pitch  period.   If  the  pitch  period  is 
shortened  the  gain  must  be  reduced  to  make  up  for  the 
increased  energy  being  input  to  the  vocal  tract  filter. 
The  relationship  between  the  pitch  period  and  the  formant 
bandwidth  also  requires  further  study.  It  appears  that  the 
formant  bandwidths  (Q's  of  the  vocal  tract  resonators) 
should  produce  a  impulse  response  which  is  significantly 
attenuated  by  the  time  the  next  impulse  is  input  to  the 
filter.   There  is  most  likely  a  feedback  effect  between  the 
vocal  tract  resonators  and  the  vocal  cords  vibration  rate 
which  is  not  considered  by  the  model  used.  This  effect  is 
noted  in  the  graphical  output  as  sharp  discontinuities  at 
the  point  where  each  new  impulse  is  generated. 
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3 .   Gain  Adjustment 

Although  overall  gain  of  the  system  can  be  adjusted 

easily  at  the  output,  the  relative  amplitude  from  frame  to 

frame  must  be  retained  during  the  processing.   The  gain 

coefficient,  root  mean  square  of  the  error  function,  is 

adjusted  to  account  for  the  change  in  the  energy  of  the 

vocal  tract  impulse  response  brought  about  by  the  bandwidth 

changes.  As  was  described  earlier  the  ratio  of  the  original 

and  modified  vocal  tract  filter  gain  a  zero  frequency  is 

used  to  estimate  the  ratio  of  inpulse  response  energy. 

Although  this  is  not  strictly  true,  as  long  as  the  scaling 

factors  are  limited  to  those  which  produce  realistic 

speech  sounds,  this  appears  to  work  very  well.   The  zero 

frequency  gain  of  the  original  vocal  tract  filter,  G(in), 

is  calculated  before  the  LPC  coefficients  are  modified. 

P 

G( in)  *  \       a. 

i=0 

The  value  of  both  a   and  a'  is  unity.  After  the 

o  o 

coefficients  are  modified  the  same  calculation  is  performed 
aga I n. 

P 
G(out)  =   )  a! 


i=0 
The  root  mean  square  of  the  error  signal,  rms(E),  is 
multiplied  by  the  ratio  to  obtain  the  new  ?a i n  coefficient, 
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rrns'CE)  . 

rms'(E)  =  rms(E)  x  G(in)  /  G(out) 

G.   SPEECH  RECONSTRUCTION 

Reconstruction  of  the  sampled  speech  waveform,  from  the 
modified  LPC  parameters  is  accor.pl  i  shed  by  subroutine 
RECCN.  This  routine  not  only  decodes  both  voiced  and 
unvoiced  speech,  but  also  makes  allowance  for  the 
transition  of  varying  parameters  from  frame  to  frame.  The 
LPC  parameters  from  the  previous  frame  are  saved  between 
calls  to  subroutine  COEFF  and  are  used  during  the  current 
frame  when  needed.  It  is  also  necessary  to  save  output 
values  from  the  previous  frame  to  allow  the  recursive 
calculation  of  the  output  values  at  the  begining  of  the 
current  frame. 

1.   Unvioced  Speech 

Curing  continuous  unvoiced  speech  (as  opposed  to 
the  previous  frame  being  voiced)  the  new  LPC  parameters  are 
used  immediately  upon  entry  to  subroutine  RECON.  The 
excitation  function  is  determined  by  calling  a  library 
routine  GGNOF  which  returns  normally  distributed  random 
numbers  with  zero  mean  and   a  variance  of  unity,  and 
multiplying  the  value  returned  by  the  gain  parameter.  The 
excitation  function  is  changed  for  every  output  sample  to 
simulate  the  continuous  excitation  caused  by  turbulent  air 
in  the  vocal  tract.   The  vocal  tract  filter  Is  implemented 
by  the  recursive  addition  of  past  values  of  the  output  to 
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the  excitation  function.   The  z-domain  transfer  function 

s(z)   1 

e(z )  =      p 

1+Z  a->£' 

i=l 

is  implemented  with  the  discrete  time  function 

P 
s(n)  =  e(n)  -  r~*  a.  s(n-i) 

L     ' 

i=i 

where  s(n)  is  the  output  sample  and  e(n)  is  the  excitation 
f unct  ion . 

2 .   Vo  ?  ced  Speech 

During  voiced  speech  a  certain  amount  of  continuity 
must  be  maintained  from  frame  to  frame.   This  was 
accomplished  by  allowing  any  uncompleted  pulses  from  the 
previous  frame  to  finish  before  the  parameters  are    changed. 
Immediately  upon  entering  the  subroutine  during  voiced 
speech  the  pulse  period  counter  is  tested  to  see  if  it  is 
equal  to  the  former  pulse  period.  If  the  former  pulse  is 
not  complete  the  routine  goes  ahead  and  recursively 
calculates  the  output  values.  Upon  completion  of  a  pulse 
from  a  former  frame  or  any  pulse  during  the  current  frame, 
the  new  LPC  parameters  are    used  to  replace  the  old  one. 
There  was  a  direct  replacement  for  all  parameters  except 
the  gain  coefficient.  The  geometric  mean  of  the  old  and  new 
gain  coefficients  is  used  for  the  gain  on  the  current  pulse 
and  the  old  gain  replaced  with  the  gain  just  calculated. 
This  provides  for  the  difference  between  the  old  and  new 
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gain  parameters  to  decay  exponentially  but  prevents  sharp 
changes  in  amplitude  from  frame  to  frame  and  make  the 
output  speech  more  natural. 
3.   Transition  Frames 

If  the  current  frame  and  the  previous  frame  were 
not  of  the  same  type  care  must  be  taken  to  insure  that  all 
parameters  are  changed  together.  If  LPC  coefficients  for 
unvoiced  speech  were  used  with  a  pulsed  output  an  unnatural 
sound  would  be  likely  to  be  produced.   During  the 
transition  from  unvoiced  co  voiced  speech,  the  retained 
values  from  the  previous  frame  are    normally  small  in 
comparison  to  the  amplitude  of  the  pulsed  excitation 
function.   Therefore  the  voiced  speech  production  may  begin 
immediately.   When  the  opposite  is  true,  the  large 
amplitude  samples  near  the  begining  of  a  output  pulse  are 
significantly  larger  than  the  unvoiced  excitation  values. 
Therefore  whenever  unvoiced  speech  follows  a  voiced  frame, 
the  previous  output  pulse  is  allowed  to  finish.   The 
damping  that  occurs  during  the  voiced  pulse  normally 
reduces  the  magnitude  of  the  samples  near  the  end  of  the 
pulse  to  the  point  where  they  will  not  interfere  with  the 
unvoiced  speech  to  follow. 

H.   OUTPUT  PROCESSING 

The  reconstruced  speech  samples  are   output  onto  a 
standard  nine  track  IBM  360  magnetic  tape.  These  values 
were  later  input  to  a  data  conversion  program   (Appendix 
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A.if)  which  converted  the  floating  point  values  to  integers 
which  were  in  the  proper  format  for  the  XDS  9300  and  within 
an  appropriate  range  for  the  XDS  9300's  digital  to  analog 
converter.  The  necessity  of  using  a  seven  track  tape  for 
data  transfer  still  existed,  so  the  significant  bit  of  the 
integers  had  to  be  shifted  into  the  proper  position  so  that 
none  of  the  eight  bits  dropped  during  the  writing  of  each 
value  onto  the  seven  track  tape  would  effect  the  data. 
This  tape  was  input  to  the  XDS  9300  which  via  the  digital 
to  analog  converter  made  the  samples  available  on  the 
COMCOR  5000  in  analog  form. 

These  samples  were  output  at  a  rate  of  5000  per  second 
thru  a  sample  and  hold  circuit.   Again  two  1 ow  pass  filters 
were  used  to  remove  the  time  quantization  noise  from  the 
samples.  The  analog  waveform  was  recorded  at  3  3/h    i ps  on  a 
standard  tape  recorder  which  could  be  played  at  7  1/2  i ps 
to  hear  the  reconstruced  speech. 

I .   GRAPHICAL  OUTPUT 

The  programs  described  above  were  also  able  to  produce 
a  varity  of  graphical  outputs  to  assist  the  researcher  in 
following  the  signals  through  the  LPC  processing.  The 
waveforms  available  from  these  programs  are: 

( 1)  I nput  speech 

(2)  Error  signal  before  filtering 

(3)  Error  signal  after  filtering 
('>+)    Reconstructed  output  speech 
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The  z-plane  pole  locations  determine  the  formant 
frequencies  and  bandwidths  and  were  also  available  for 
graphical  display.  A  seperate  program  (Appendix  A. 3)  was 
written  to  display  the  logarithmic  power  spectral  density 
of  the  input  and  output  speech  for  a  number  of  consecutive 
frames  and  proved  useful  in  analysis  of  the  output  quality, 
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VI  I .   RESULTS 

The  desired  result  of  this  study  was  the  reconstruction 
of  speech  at  different  pitch  and  formant  frequencies  than 
that  of  the  input  speech.   The  complete  process  of 
encoding,,  modification  and  decoding  was  accomplished  for 
three  5-second  segments  of  speech.   Upon  completion  of  the 
process  most  listeners  agreed  that  although  the  input 
speech  was  female,  the  modified  output  speech  sounded 
typically  male.   Although  the  audio  output  was  somewhat 
lacking  in  quality  it  was  intelligible. 

Examples  of  the  printed  and  graphical  computer  output 
are  given  in  Appendix  B.   Two  examples  are    completely 
covered.  The  first  3Sk   msec,  segment  (15  frames)  is  of  the 
vowel  'e'  and  the  second  segment  is  of  the  transition  from 
a  fricative  to  a  voiced  sound,  'sa',  from  the  begining  of 
the  word  salt.   Both  were  derived  from  a  recording  of  a 
female  speaker  were  reconstructed  first  without 
modification  and  then  with  mod i f i caat i ons  which  consisted 
of  reduction  of  the  pitch  frequency  by  a  factor  of  0.58  and 
reduction  of  the  formant  frequencies  by  a  factor  of  0.88. 
First  the  input  waveform  with  the  logarithmic  power 
spectral  density  plot  of  that  portion  of  the  speech  is 
given.   Examples  of  the  printed  processing  summary  are    next 
and  are  followed  by  the  waveforms  of  the  error  signal  and 
the  filtered  error  signal.  Plots  of  the  vocal  tract  pole 
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locations  are    shown  with  the  poles  at  input  superimposed  on 
the  poles  after  modification.  Finally,  speech  waveforms  for 
both  unmodified  and  modified  output  with  their  respective 
logarithmic  power  spectral  density  functions  are    displayed. 
The  audio  output  is  available  from  the  author  on  request, 
in  the  form  of  an  audio  tape  recording.   This  tape 
recording  is  described  in  detail  in  Appendix  C. 

The  results  above  demonstrate  the  feasibility  of  the 
use  of  linear  predictive  coding  as  a  technique  for  voice 
modification.   This  research  also  indicated  areas  in  which 
further  study  and  improvement  may  be  made.  Some  of  these 
areas  are: 

(1)  The  effect  of  noise  during  voiced  speech  on  the 
prediction  error  and  on  the  gain  calculated  from 
the  error.  It  may  be  possible  to  use  only  the 
energy  occuring  at  the  peaks  of  the  error  signal 
and  thereby  attribute  the  remainder  of  the  error 
signal  as  being  due  to  noise. 

(2)  The  effect  of  the  use  of  different  window 
functions  in  autocorrelation  function  calculation 
and  how  this  variation  effects  pitch  period 
determination  and  the  voicing  threshold. 

(3)  The  possibility  of  constructing  a  LPC 
processing  system  with  asyncronous  clocks  for  the 
frame  timer  and  the  output  sample  gereration.  This 
would  produce  a  very  similar  effect  to  that 
accomplished  here,  but  probably  at  a  reduced  cost. 
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VIM.   CONCLUSIONS 

With  the  refinement  and  standardization  of  LPC 
commuication  processors,  the  ratio  of  processing  time  to 
real  time  for  unaltered  communication  is  expected  to  drop 
below  the  current  65%.  The  available  computation  time  may 
be  used  for  the  pitch  and  formant  alteration  described 
above  or  for  other  modification  which  can  be  accomplished 
at  either  the  transmitting  or  receiving  processor  and  still 
allow  real  time  voice  communications. 

A  number  of  possible  applications  of  the  speech 
frequency  characteristic  modification  described  are: 

(1)  A  digital  hearing  aid  for  persons  (such  as  the 
author)  with  high  frequency  hearing  loss. 

(2)  Radios  in  military  vehicles  which  would  produce 
speech  In  a  frequency  range  different  than  the 
range  of  the  predominant  noise  in  the  vehicle,  i.e. 
1 ow  pitch  voice  in  turbine  aircraft  with  high 
frequency  noise  and  high  pitched  voice  for 
helicopters  and  tanks  where  low  freauency  noise  is 
most  prevalent. 

(3)  Voice  channel  jammers  which  would  produce 
random  phonemes  with  pitch  and  formant 
characteristics  similar  to  the  current  users  of  the 
channel . 

As  LPC  communications  systems  become  common  because  of 

their  low  data  rate  requirements,  the  use  of  the  LPC 

parameter  modification  will  be  desired  to  extend  the 

flexiblity  of  voice  communication  and  storage  systems. 

Frequency  modification  is  one  viable  process  available. 
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APPENDIX    A.l 


SEVEN    TRACK 
PROGRAM 


TO   NINE    TRACK    TAPE    CONVERSION 


DI  NENSI ON 
FACTOR    = 
REWIND   2 
REWINO    4 
N=1C24 
K=0 

5  K=K+1 
IF(K.GT.l 

6  BSU«    =    0. 

7  J  =  0 

8  J=J+1 
IFU.LE.5 

12  READ(2tl5 

GO    TO    12 

10  REAC(2,15 

15  F3RMAT(12 

17  CALL      FOR 

JJ=( J-l)* 

SUM    =0.0 

DO    20    1=1 

11    =    I+JJ 

DAT(  II  )    = 

SUM    =    SUM 

20  CONTINUE 

SUM    =    SUM 

*RITE<6,2 

25  F0RMAT(40 

*  ■    HAS 
IF     (J.LE. 

30  FORMAT(  » 
IF     (J.LE. 

31  FOR  MAT!  IX 
BSUM    =    BS 

90  GO    TC    8 

2G0         J=J-1 

WRITE<6,2 
205         FORMAT!1 

*  «  RECO 
8SUM  =  BS 
DO  55  J=l 
DATU)     = 

95  CONTINUE 

WRITE(4,9 
93  F3RMAT(12 

EN CF I  LE    4 
WRITE(6,3 
100  GO    TO    5 

60  vmITE<6,6 

65  PORMAT(» 

GO    TO    17 
END 


IDAT(1024)1CAT(53248) 
100.  0/(2.0**23) 


3)     STOP 
0 


0)    GO    to    10 
,END=200)     IDAT 

,END=200,EPR=6Q)    IDAT 

8<  8A4)  ) 

M(IDATfN) 

1024 

,1024 

FLOAT(IDATd)  )*FACTOR 
+    D  AT  (  1 1  ) 


/I 02 4. 

5)     J,K 

X,'*    RECORD    '  ,13, 

BEEN    READ     *«  ) 

1 


X,'*    RECORD    '.IS,'     OF    FILE    ',13, 

BEEN    READ     *«  ) 

1)     WRITE(6,30J     K,SUM,(DAT( L  ),L  =  i,1024) 

FILE    =',13,'     AVG    =     • ,E16.7//(1X,3E14.7)) 

1)     WRITE(6,31)     IOAT 

,8115) 

UM     +    SUM 


,8115) 

UM    +    SUM 


05)     K,J 

END    CF    FILE    '  ,13,  16, 

RDS    HAVE    BEEN    READ'  ) 

UM/FLOAT(  J  ) 

,51200 

DAT(  JJ-BSUM 

8)     (DAT(L)  ,L  =  1  ,51200) 

8A4) 

0)     K,BSUM, (DAT(L), L=l ,1024) 

5)     K 

**    ERR    FILE' ,13,'    **  '  ) 
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APPENDIX     A. 2  LINEAR    PREDICTIVE    COOING    AND    VOICE 

MODIFICATION  PROGRAM 


C 

C   LINEAR  PREDICTIVE  CODING  AND  SPEECH  MODIFICATION 

C   PRCGRAN 

C 

C      SAMPLEC    SPEECH    IS    INPUT    VIA    FILE    FT02FCG1    (TAPE    OR 

C       DISK)     IN    FORMAT    128A4    FOR    EFFICIENT    STORAGE 

C 

C      SPEECH    IS    ENCODED    INTO    LPC    CONSISTING    OF    PITCH    PERIOD 

C       (IPP),    VOICED/UNVOICED    DECISION    ( IV F )  ,    GAIN    FACTCR 

C       (RMSE),    AND    LPC    COEFFICIENTS    (A(I)) 

C 

C   MODIFICATIONS  TO  CHANGE  POLE  POSITIONS  MAY  BE  SPECIFIED 

C   SAMPLED  SPEECH  IS  RECONSTRUCTED  AND  OUTPUT  ONTO  FILE 

C   FT03F00It  ALSO  IN  128A4  FORMAT 

C 

C       PROGRAMMED    BY    G.T.HALLt    1 S78 

C 

DI MENS  I  ON    X«256)tA(14)fXX{14)tE(256ltX0(2561 

DIMENSION    EF(256)  , E S( 5 )  ,  EFS ( 5 )  ,  ZERO (256) 

COMPLEX    P  (14-) 

OATA    XX,EFS»ES,ZER0/230*0.0/ 
C 

C      SET    VOICE/UNVOICE    THRESHOLD 
C 

THRESH   =     2.05 

IWIN    =    1 
C 

C      SET    ORDER    OF    PREDICTOR 
C 

IP    =    12 
C 

C       PLOTTER    OUTPUT 

C  1=  INPUT  2=ERR0R    SIGNAL 

C  3  =  FIJ_TERED    ERROR         4-OUTPUR 

C  5=P0LE    LOCATION    (FIRST    NPLPLT    FRAMES) 

C 

IXPLT    =    5 

NPLPLT    =    10 
C 

C       SET    IWRXX=1    FOR    PRINTED    RESULTS    FROM    SUB 
C 

IWR    =    1 

IWRERR    =    0 

IWRAUT    ■    0 

IWRALT    =    1 

IWRPP    =    0 

IWRPOL    =    0 

IWRNC    =    1 
C 

C     -SET    MODIFICATIONS    DESIRED 
C  (FSC)    FREQUENCY    SCALE    COEFF 

C  (BSC)    BANDWIDTH    SCALE    COEFF 

C  (PSC)    PITCH    PERIOD    SCALE   COEFF 

C  (RSC)    REAL    POLE    SCALE    COEFF 

C  (RLIM)    REAL    POLE    MAGNITUDE    LIMIT 

C  (SP)    SAMPLING    INTERVAL 

C 

FSC    =    0.38 

BSC    =    0.63 

PSC    =    1.73 

RSC    =    1  .0 

RLIM    =    0.95 

SP    =    COO  01 
C 

C       SET    NUMBER    OF    SAMPLES     PER    FRAMc 
C 

N    =    256 
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c 

C       SET    NUMBER    OF    FRAMES     (NFRAME)    AND    NUMBER    OF 

C       FRAMES    SKIPPED    BEFORE     FIRST    ANALYZED 

C 

NFRAME    =    15 

ISKIP    —    28 

IF     (ISKIP. LE.O)    GO    TO    2 

00    1    L    =    1,1 SKIP 

READ    (2, 15, END    =    999)     (X<J),J=1,N) 

1  CONTINUE 

2  IF  (IXPLT.  LT.  5. AND,  IXPLT. GT.O)    CALL    \<PLTIN(N) 
DO    200    I    =    1, NFRAME 

READ    (2, 15, END    =    999)     (X(J),J=1,N) 
15  F0RMATQ28A4) 

IF     (IXPLT. EQ.l)    CALL    VPLT(X) 
C 

C      DETERMINE    RMS    VALUE    OF    SPEECH    SAMPLES 
C 

CALL    RMS     (X,N,RMSX> 

IF    (IWR.EQ.l)    WRI^E    (6,20)     I,RMSX 

20  F0RNATP1FRAME    »  ,  14  //1X,  '  RMS    VALUE    OF    SAMPLES       =     ', 
*    F18.8) 

C 

C       DETERMINE    PREDiCTCR    CGEFF    BY    AUTOCORRELATION    METhGD 

C 

CALL    AUTO     IXrNrAf IPtlWINf IWRAUT) 
IF(IWR.EQ.l)     WRITE(6,21)     (  (  J,  A(  J)  )  ,  J  =  l,  I  P  ) 

21  FORMAT(/iX, 'PREDICTCR    CGEFF I C I ENT S ■ /( 10X, 13,1 X, Fl 3. 8 ) ) 
C 

C      DETERMINE    ZERO    PREQ   GAIN    GF    VOCAL    TRACT   TRANS     FCN 
C 

GIN    =   1  .0 

DO    22    J    =    UIP 

GIN    =    GIN    +    A(J) 

22  CONTINUE 

GIN    =    1.0 /GIN 

IF     (IWR.EQ.l)    WRITE(6,22)    GIN 

23  FQRfATi/'     G    IN=«,F10.5) 
C 

C      DETERMINE    POLES    OF    CHARACTERISTIC    EQUATION 
C 

CALL    POLES    (A,IP,P,IV<RPOL,ICK) 
C 

C       INVERSE    FILTER    SAMPLES    TO    GET    ERROR    SIGNAL 
C 

CALL    ERR     (X,N, A, IP, E,XX ) 

IP    (IXPLT. EQ. 2)    CALL    VPLT(E) 

IF    (  IWRERR.EQ.l)    WRITE    (6,25)     (E(J),J    =    1,N) 
25  FGRVAT(1X,10F12.4) 

C 

C       DETERMINE    RMS    VALUE    OF    ERROR 
C 

CALL    RMS    (E,.N,RMSE) 

IF     (IWR.EQ.l)    WRITE    (6,30)     RMSE 
30  FORMAT(  AX, 'RMS    VALUE    GF    ERROR       =     ',F18.3) 

RATIO       =    RMSX/RMSE 

IF     (IWR.EQ.l)     WRITE     (6,40)    RATIO 

40  F0RMAT(/1X, 'RATIO    SAMPLE    RMS    TC    ERRCR    RMS    =    l,F18.8) 
C 

C       TEST       IF    VOICED    CR   UNVOICED 
C 

IV  F    =    0 

IF     (RATIO. GE. THRESH)    IVF    =    1 

IF     (  IVF. EQ.l)     WRITE    (  6,41) 

41  FGRNAT(/«     THIS    FRAME    IS    VOICED'/) 
I*     (IVF.EQ.O)     WRITE     (6,42) 

42  FORMATt/'     THIS    FRAME    IS    UNVOICED'/) 
C 

C       IF    UNVGICED    BYPASS   PITCH    DETECTION 


C 


IF     (IVF.EQ.O)    GO    TO    45 

CALL    PITCH    (N,E,EF,ES,EFS,IPP,IWRPP) 
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IF     (IXPLT. EG. 3)     CALL   VPLT(EF) 
GO    TO    49 

C 

C      IF    UNVOICED    ZERO    SAVED    POST    FILTER    ERRCR 

C 

45  CO    46    J    =    1,5 
EFS(J)     =    0.0 

46  CONTINUE 

IF     (IXPLT. EQ.3)    CALL    VPLT(ZERO) 
C 

C      DETERMINE   NEW    PITCH    PERIOD 
C 

49  IPPN    =    IFIX(FL0AT( IPP)*PSC+0.  5) 
C 

C      ALTER     POLE    LOCATIONS 

C 

IF (I.EQ.l  .AND.IXPLT.EQ.5)    CALL    PLOTS (  IA, I B ,  IC  ) 
IFd.EQ.NPLPLT.AND.  IXPLT.EQ.5)    IXPLT=0 
CALL    ALT2 (P,FSC,8SC,RSC ,RLIM, SP , IP, IWRALT, IXPLT) 
WUTE(6,511     IPPN 

51  FORMAT*/'     PITCH   PERIOD    AFTER    MODI FICAT I  ON • , 13 ) 
C 

C      CALCULATE    NEW    PREDICTOR    COEFFICIENTS 
C 

CALL    NEWCF(IP,P,A,IWRNC) 

DO    5C    J    =    1,IP 

J  J    =    J+N-IP 

XX(  J)     ■    X(  JJ) 

50  CONTINUE 
C 

C      DETERMINE    ZERO    PREQ    GAIN    CF    VOCAL    TRACT    TRANS     FCN 
C 

GOUT    =1.0 

00     52    J    =    UiP 

GOUT    =    G3UT+AU  ) 

52  CONTINUE 

GOUT   =    1.0/GOLT 

IF    (IWR.EG.l)    WRITE(6,53)    GOUT 

53  FORMAT!/1     G    OUT    =  ',F10.5) 
C 

C      ADJUST    OUTPUT    GAIN 

C 

RMSE   =    RMSE*GIN/GOUT 

CALL    RECON(A,IP,RMSE,IVP,IPPN,N, XC) 

IF     (IWR.EQ.1)    WRITE     (6,54)     (XQ(L),L    =    1,N) 

54  FORVAT(/»     OUTPUT    SAVPLES'/( 1X,10F13.5) ) 
IF     (IXPLT. EQ. 4)     CALL    VPLTIXO) 
WRITE(3,15)     ( X0( J) , J=1,N) 

200         CONTINUE 
999  IPEN    =    999 

CALL    PLOT  (A, B,  IPEN ) 

STOP 

END 


79 


SUBROUTINE  AUTO  ( S , N, A, IP, IWIN, IWR  ) 
C 

C  DETERMINE  LINEAR  PREDICTION  COEFFICIENTS  FOR  A  SET  OF 

C  INPUT  SAMPLES  USING  THE  AUTQCORRELAT I CN  METHOD 
C 

C  S  =  VECTOR  OF  INPUT  SAMPLES 

C  N  =  NUMBER  OF  SAMPLES 

C  A  =  VECTOR  OF  PREDICTOR  COEFFICIENTS 

C  IP  =  NUMBER  OF  PREDICTCR  CCEFF  (  OROER  CF  MODEL  ) 
C  IP.LT.17 

C 

C  IWIN  =  TYPE  OF  WINDOW  <  SEE  SU  3R  WINDOW  ) 

C   IWR  =  0  NO  PRINTING  OF  PREDICTION  COEFFICIENTS 

r 

C   REF:  MAKHOUL:  LINEAR  PREDICTION 
C  PROC  IEEE,  APR  75 

C 

DIMENSION  SCI)  tT(512)  tR(16)tA(l) 

CALL  WINDW  { S,T,N, I  WIN) 
C 

C   CALCULATE  AUTOCORRECATI CN 
C 

RO  =  0.0 

DO  10  1=1, N 

RO    =    RO    ♦    T(  I  )**2 
10  CONTINUE 

DO    3C   J=1,IP 

SUM    -    0.0 

N.N    =    N-J 

DO    20    1=1, NN 
SUM    =    SUM+T(I)*T (I+J ) 
20  CONTINUE 

R(J  )    =    SUM 

30  CONTINUE 

IF(  IwR.EQ  .!)•    WRITE(6t31)    RO  ,  (  R  (L  )  ,  L  =  It  I  P  ) 

31  F0RM4T(/1X, rAUTOCOR£L    V ALS ' , F16 . 5/1X, 3F16. 5/1 >, £F16 . 5 ) 
C 

C       SOLVE    "ATRIX    EQN    FOR    A    VECTOR 
C 

CALL    COEFF(R0,R,IP,A,IWR) 
C 

C       TAKE    NEGITIVE    OF    PREDICTOR    COEFF    TO    GET 
C      COE<=F    OF    CHARACTERISTIC    EQN    OF    FILTER 


C 


DO    &C    I    =    1 ?IP 
A(  I  J    =  -A  (  I  > 


60  CONTINUE 

IF    (iWR.NE.O)     WRITE(6,70)     (  ( I  , A (  I )  )  ,  1=1 , 1 P ) 

70  FORMATt/lX, 'PREDICTOR    C CEFFIC IEN TS ' / ( 10X, I  3 , IX ,F18. 8 ) ) 

RETURN 
END 
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SUBROUTINE    COEFF ( RO  , R ,N, A, I WR) 
C 

C  SOLVES    THE    MATRIX    EQUATION    PR    A    =    R 
C 

C  RR    =       AUTOCORRELATION    MATRIX 

C  RR    =       R(O)  R(I)         R(2) R(M-l) 

C  R<1)  R(O)         R(l) R(N-2> 

C  S(2)  R(l)         R(O) R(N-3) 

C 

C  R(M-l)     R(N-2)    R(N-3) R(O)' 

r 

C      R    =  AUTOCORRELATION    VECTOR 

C      R    =         R(l) 

C  R(2) 

C  R(3) 

C 

C  P(N) 

C 

C   A  =  VECTOR  OF  PREDICTOR  COEFF 

C   A  =  AU) 

C       A(2) 

C       A<3) 

C 

C       A  (  N  ) 

C 

C       METHOD    ATTRIBUTED   TO    DURBIM    AS    DESCRIBED    IN 

C       'LINEAR    PREDICTION*    BY    VAKHOUL,PROC    IEEE    APR    75 

C   P.  566 

C 

DIMENSION  AK(20) t 40(20) ,A(20) ,R(2C) 
C 

C  FIRST    ITERATION 

C 

EO    =    RO 

AK(1)    =    RQ)/EO 

4(1)    =    AK ( 1  ) 

E    =    (1.0-AKM)**2)*E0 

EO     =    E 

A0(1)     =    A(l.) 
C 

C  FOLLOWING    ITERATIONS 

C 

DO     100    I     =    2,N 

IM1    =    I-i 

SUM    =    0.0 

DO    20    J    =    ItlMl 
iMJ    =     I-J 

SUM    =    SUM+R(IMJ)*AQ(J) 
20  CONTINUE 

AK(I)     =    ( R(i)-SUM)/EC 

A(  I  )    =    AMI  ) 

CO    30    J    =    1, IM1 
IMJ    =    I-U 

MJ)     =    A0(  J)-AK(  I)*AO(IMJ) 
30  CONTINUE 

E    =    (1.0-AKCI )**2) *E0 

EO    =    E 

DO    50    J    =    1,1 

40(J*    -    A(J) 
50  CONTINUE 

100  CONTINUE 
C 

C       PRINT     £     (REMAINING    ERROR    DUE    TO    LIMITING 

C       ORDER    CF    APPROXIMATION)     AND    A    CHECK    CF    SOLUTION 

C       IF    DESIRED 

C 

IMIWR.EQ.l  1     WRITE(6»101)     E 

101  FORMAT* «    SUB    COEFF    E=    «,F18.8) 
IFIIWR.EQ.11     CALL    TEST  (  A,  RO  ,  R  ,N  ) 
RETURN 

END 
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SUBROUTINE    TEST    (A,RO,RTIP) 
C 

C      MULTIPLIES    PREDICTOR    CCEFF    VECTOR 
C       A    BY    THE    AUTOCORRELATION    MATRIX    RR    AND    CHECKS 
C       THE    VALUE    AGAINST    THE     AUTOCORRELATION    VECTOR 
C       TO    INSURE    ACCURATE    SOLUTION. 
C 

DIMENSION    A(IF),R(IP1 
DO    1C    I    =    If  IP 
SUM    =    0.0 

DO    9    J    =    1,IP 

L    =    IABS(  I-J) 

IF<L.EC.0>     SUM    =    SUM+A(J)*RO 

IF(L.NE.O)     SUM    =    SUM+A(J)*R(L) 

9  CONTINUE 
WRiTE(6,15)     I,R(I)tSUM 

15  FORMATl'    R(',I2,')     =    ',2E14.4,'     =    SIMM 

10  CONTINUE 
RETURN 
END 


SUBROUTINE    POLES    ( A , I P, P, I WR, ICK ) 
C 

C      CALCULATES    POLES    OF    CHARACTERISTIC    ECN    FROM 
C       PREDICTOR    COEFFICIENTS    AND    IF    WANTED     PRINTS 
C      OR    PLOTS    THOSE    POLES 
C 

C       A    =    VECTOR    OF    PREDICTOR    COEFFICIENTS 
C       IP    =    NUMBER    CF    CCEFF    ANC    POLES 
C  COEFF    AO    IS    ASSUMED    TO    BE    1.0 

C       P    =   COMPLEX    VECTOR    OF    POLE    LOCATIONS 
C       IWR    =    0    NO    PRINTING    OF    POLES 
C       ICK    =    0    ALL    POLES    INSIDE    UNIT    CIRCLE 
C  =1    POLE    OUTSIDE    UNIT    CIRCLE 

C 

DIMENSION    A(1),B<21) ,X<  20) ,Y( 20) , NAME (20) 

COMPLEX    P  (!) 

8(1  )    =   1.0 

CO    10    1=1,  IP 

II  =  1+1 

3(11)    =   A(I) 

10  CONTINUE 

II P    =    I P+! 

CALL    ZRP0LY(8,IP,P,IER) 

IF(IWR.NE.O)    WRITE(6,20)     (  (  I ,  P<  I  )  )  ♦  1=  1, 1  P  ) 
20  FORMAT( //10X, 'POLES    CF     CHAR    EGN» / ( lOX , 13, IX, 2 E14 .7 ) ) 

ICK    =    0 

DO    30    I    =    1,1  P 

IF (CABS(P( I) ) .LE.l. 0)    GO    TO    30 

ICK    =    1 

IF(IWR.NE.O)     WRITE(6,25)     I 
25  F0RMATC20X,*P0LE    NUMBER    ',13, 

*       ■     ABOVE    IS    OUTSIDE    UNIT   CIRCLE') 
30  CONTINUE 

RETURN 

END 
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SUBROUTINE    ERR    ( S  ,  N  ,  A , I P,E,SX ) 
C 

C      DETERMINE    AN    ERRCR   VECTOR    CF    DIFFERENCE 
C       BETWEEN    ACTUAL    SAMPLE     VALUES    AND    THE 
C      VALUES    PRECICTED    FROM    PAST    SAMPLES. 
C 

C       S    =    VECTOR    OF    SAMPLES 
C      N    =    NUVGER    OF    SAMPLES 
C      A    =    VECTOR    OF    PREDICTOR    COEFF 
C       IP    =    NUMBER    OF    PREDICTOR    COEFF 
C       E    =   VECTOR    OF    ERROR    VALUES 
C       SX    =       EXTRA    SAMPLES    (     IP    OF    THEM    ) 
C  SAVED    FROM    LAST    FRAME 

C 

C      THE    ERROR    IN    THE    DIFFERENCE    BETWEEN    THE 
C       CURRENT    SAMPLE    AND    THE    WEIGHTED    SUM    OF 
C      ThE    LAST    IP    SAMPLES. 
C 

DIMENSION    SU)tA(l)  ,  E  ( 1 )  f  T  (  542  ),  SX  {  1) 

DO    10    1*1, IP 

T<  I  )    =    SX(I) 
10  CONTINUE 

DO  20  I  =1,N 

TCI+IP)  =  S(I) 
20     CONTINUE 

DO  4C  I=i»N 

SUM  =  0.0 

00    30    J=l ,IP 
11    =    I+J-l 
JJ    =    IP-J+1 
SUM    ■    SUM+T(II ) *A(JJ) 
30  CONTINUE 

E< II    =    S(I )    +    SUM 
40  CONTINUE 

RETURN 

END 
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SUBROUTINE    P  ITCHCN,  E,EF,ES,EFS,IPP,IWR) 
C 

C       DETERMINES    PITCH    PERIOD    (IN    NUMBER    OF    SAMPLES) 
C       FROM    THE    ERROR    SIGNAL    OF    INVERSE    FILTERED    SPEECH 
C 

C      N    =    NUMBER    OF    SAMPLES 
C 

C      E    =   ERROR    VECTOR 
C 

C      EF    =    FILTERED    ERRCR    VECTOR     (OUTPUT) 
C 

C       ES    =    FIVE   SAVED    ERROR    SAMPLES 
C 

C       EFS    =    FIVE    SAVED    FILTERED    ERROR    SAMPLES 
C 

C       IPP    =    PITCH    PERICD    (OUTPUT) 
C 

C       IWR    =    1    FOR    PRINTING    DURING    SUBROUTINE 
C 
C 

DI MENS  I  ON    ES (5) ,EFS(5),  E( 1 ) , EF( 1  )  , R (256 ) 

DIMENSION     XI  (261)  ,  X0(261) 
C 

C         FORM    FILTERING    VECTOR ( N+5 ) 
C 

DO    10    1=1,5 

XI<  I)  =5S(  I  ) 

X0(  I)  =  EFS (I  ) 
10  CONTINUE 

ITEMP=N+5 

DO    15     1=6, ITEMP 

11=1-5 

XI(  I  )  =  E( II) 
15  CONTINUE 

DO    20    1=6, ITEMP 
C 

C  BUTTERWORTH    DIGITAL    FILTER    CUTOFF    AT    800    HZ 

C 

XO(I)    =    0 .447451239E-3*XI( I )+ C. 2257 2562E-2*XI ( 1-1) 

*  *0.44745!24E-2*XI(  I-2)+0  .447451239E-2  *XI  (1-3) 

*  +0.22  372562E-2=XI (I-4>  +  0. 447 45123 9E- 3* XI ( 1-5 ) 

*  +3.41077231*X0(  I-  U-4.7  32  3C  £  37*X0  (  1-2) 

*  +3.42  533523*X0(I-3)-1.249  29  545*X0(I-4) 

*  +0,  185257941*X0( 1-51 
C 

EF (I-5)=X0(I) 
20  CONTINUE 

00    30    1=1  ,5 

ES( I)=E( I+N-5) 

EFS(  I)=EF(  I+N-5  ) 
30  CONTINUE 

IWIN  =  4 

CALL    WINDW(EF,XQ,N, IWIN) 
C 

C      CHECK    FOR    PEAKS     1.2   TO     13. C    MSEC 
C 

ITEMPC=N-56 

IF(JWR.EQ.l)    WRITE(6,33)     ( (  EF  (  L  )  ,  X0(  L)  )  ,L=1  ,  N  ) 
33  F0RMAT(1X,10F13.5) 

00    5Q    I  =  1,ITEMPC 

SUM=0.0 

ITE*PA=N-I 

DO    40    J=1,ITEMPA 
SUM=SUM+XO  (J  )*XO(J+I  ) 

40  CONTINUE 
R(  I  )=SUM 

IF<  IWR. EQ.l)    WRITE(6,41)     I,R(I) 

41  FORMATC     FILTERED    ERROR    AUTOCORRELATION    FOR',  14, 

*     Fia.a) 

IF(I.LT.33)    GC   TO    50 

ITEST=I-25 

IF  (R(ITEST)  .LT  .0.0  )    GO    TO    50 


84 


ITEMPB=I-34 

CO    45    J=ITEMPB,  I 

IF( R(I TESTJ.LT.R  (J) )     GOTO    50 

45  CONTINUE 
JPP=ITEST 
WRITE<6,46)     IFP 

46  FORMAT    ('     PITCH   PERIOD    ISM4) 
RETURN 

50  CONTINUE 

IPP=100 

WRITE(6 ,55) 
55  FORMAT!///'     SIB 

*    /'     PITCH,     PITCH 

RETURN 

END 


PITCH    FAILED    TO   DETERMINE    CCRFECT' 
PERIOD    SET    EQUAL    TO    100'///) 
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c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 


4 

c 


6 

c 

9 

C 

c 
c 


10 
20 


30 


SUBROUTINE    ALT2    ( P , FSC  ,  BSC, RS C, RL  IM , SP, IP,  IWR  ,  IXPLT  ) 

GIVEN    IP    COMPLEX     POLES     OF    THE    VOCAL    TRACT 
TRANSFER    FUNCTION,    CALCULATES    THfc    FORMANT 
FREQUENCIES    AND    BAND'W  IDTHS    AMD    SCALES    THEM 
AS    DESIRED,     PRINTED    OUTPUT    IS    AVAILABLE. 

P    =   VECTOR    OF    IP    COMPLEX    POLES 

FSC    =    FREQUENCY    SCALE    FACTGR    OUT/IN 

RSC    =    REAL    POLE    SCALE    FACTOR 

RLIM    =    REAL    POLE    MAGNITUDE    LIMIT 

BSC    =    BANDWIDTH    SCALE     FACTOR    OUT/IN 

IP    =    NUMBER    OF    POLES 

SP    ■    SAMPLE    PERICD    IN    SECONDS 

IWR    =    C    NO    OUTPUT    PRINTED 

1    PRINTED    RESULTS 
IXPLT    =    5    FOR    PLOT    OF    POLES 

DIMENSION    F0RFQ4)  ,  EVJ(14) 

COMPLEX    PQ),CPP<14)  ,CRPQ4)  ,CTSM 

DIMENSION    XP< 6) ,YP( 6>,IIPEN(6) 

DATA    XP/ 3. 0,2. 75, -2 .75 t 0.0, 0.0, 2. 5/ 

DATA    YP/ 10. 0,0.0,0.0,2. 75, -2.  75 ,0.0/ 

DATA    IIPEN/-3,3,2,  3,2,3/ 

ZERC=0.0 

IF(  IXPLT. NE. 5)    GO    TO    9 

NPEN    =    3 

CALL    NEWPEN<  NPEN) 

DO    2    1=1,6 

CALL    PLOT(XPU),YP( 

CONTINUE 

IPEN    =   2 


I) ,IIPEN<I)  ) 


DO    4    1=1,  241 

TEM    =    Q.02613*FL0AT(I ) 

XX    =    2.5    *   COS(TEM) 

YY    =2.5    *    SIN(TEM) 

CALL    PLOTIXX, YY,IPEN) 

CONTINUE 

IPEN    =    3 

CALL    PLOT     (ZERO, ZERO, IPEN) 

HIEG    =    0.  25 

ANG    =    0.0 

NC    =  -1 

I7EJT   =    4 

NPEIS    =   4 

CALL    NEWPEN(NPEN) 

CO    6    1=1,  IP 

XX    =    2.5    *    REAL(P(I ) ) 

YY    =    2.5    *    AIMAG(P(  I  )) 

CALL    SYMBCL(XX,YY,HIEG,  IT  EXT  ,  AN  G,  NC  ) 

CONTINUE 

IRP    =0 

ICP    =    0 

TEST    EACH    POLE    AND    PLACE    IN    PROPER    ARRAY 

DO    40    1=1  ,IP 

IF(AIMAG(  P(  I)  KEQ.O.O)     GO    TO    30 

IF  (ICP.EQ.O)     GO    TO    20 

DC    10    J=l, ICP 

IF(CA3S(P(  I)-CONJG(CPP<J)H  .LT.O.OOl)     GO    TC    40 

CONTINUE 
ICP=ICP+i 
CPP( ICP)=P(I  ) 
GO    TO    40 
IRP=IRP+1 
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CRPURP  )  =  PM> 
40  CONTINUE 

C 

C      CALCJLATE    FORMANT    FREQ    AND    BANDWIDTH    FOR    EACH 
C 

DO    50    1=1, ICP 
A=CABS  <CPP<I)  ) 

BW(I)=<0.0-ALOG(A) ) / { 6, 2831 352* SP ) 
TH=ATAN2(  AIMAG(CPP  (  I)),P-EAL(CPP(  I  )  )  ) 
TH=A8S(TH) 

cORF(  I  )=TH/(SP*6.2  8318  52) 
50  CONTINUE 

ICPMl=ICP-i 

DO     £0    I=1,ICPM1 

IPl-H-X 

DO    55    J=IP1,ICP 

IF(FORF{  I  )  .LT.FORFU  ))    GO    TO    55 
TEM=BW  (I  ) 
8W(I)=8w(  J) 
8W  (J  )  =  TEM 
TEM=FORF(I  ) 
FORF( I  )=FORF(J) 
FORF(J  )=TEM 
C7EM=C  PP(I  ) 
CPP( I) =CPP(J) 
CPP(J)=CTEM 
55  CONTINUE 

60  CONTINUE 

IF(IWR.EQ.l)    WRITE(6,70J     ( ( I,  CPP(  I J  ,  FORF(  I  ),BW(  I)  I  , 

*  1  =  1, ICP) 

70  FORMAT*'     FORMANT',  13,  '     DUE    TO    POLES    AT    Z=',F8.4, 

*  »+-J*',F8.4V     FORMANT     FR£Q= • , F8 .1 ,  «    BANDW IDT F  = ■ , F8  .1 ) 
IP    < IRP.cQ.Q)    GO    TO     85 

IF(IWR.EQ.l)    WRITE16,30)     (  {  I,  CRP  (  I  )  )  ,  I  =  1,  IR  P  ) 
80  F3RMATC     REAL    POLE    NUMBER    • ,13,'     AT    Z    =',2F8.4) 

85  IF(IWR.EQ.l)    WRITE(6,90)    FSC , 3SC, RSC  ,RLI M, SP 

90  FORMAT*/'     FORMANT    FREQUENCY    SCALE    FACTOR    =',F9.4, 

*  '         3ANDWIDTH     SCALE    FACTOR    =  '  ,F3.4/ 

*  «    REAL    POLE   SCALE    FACTOR    =',F8.4, 

*  »       REAL    PCLE    MAGNITUDE    LIMIT    =  ',F8.4, 

*  '       SAMPLE    PERIOD    =  «,F9.6//'     AFTER    MODIFICATION') 
C 

C      ALTER     FORMANT    FREQUENCIES    ANO    BANDWICThS 
C 

CO    100     1=1, -ICP 

A=CABS(CPP(I)  )**BSC 

IF(  \  .GT  .0.93)     A=0.99 

TH=ATAN2(  AI  MAG(  CPP  ( I  H,  ?EAL  (CPP  {  I  )  )  )*FSC 

TH=ABS(TH) 

CPP(I)=A*CMPLX(COS(TH),  SIN(TH)  ) 

BW(  H-<  O.O-ALOG(A)  )  /  (6  .  28  313  5  2*S  P  ) 

FORF(  I)=TH/(  6. 283 185 2* SP) 
100         CONTINUE 
C 

C      ALTER    REAL    POLE    LOCATIONS 
C 

IF     (IRP.EQ.Q)     GO    TO    115 

DO    110    1=1,  I.RP 

CRP(I)=CRP( I)*RSC 

TEM=CABS(CRP< I  )  ) 

IF(TEM.GT.RLIM)    CRP  (I  )= CRP ( I ) *RL IM/T EM 
110         CONTINUE 
115  IF(IWR.EQ.l)    WRITE(6,70)     <  (  I ,  CPP  <  I  )  ,FORF  (  I  )  ,  Bfc  ( I  )  )  , 

*  1=1, ICP) 

1^    ( IRP.cQ.O)    GO    TO    113 

IF(IWR.EQ.l)    WRITE(6,30)     U  I, CRP (  l)  )  ,  I =1, IR P ) 
C 

C      RECONSTRUCT    ARRAY    OF     POLES 
C 
118  INO=C 

DO    120     1=1, ICP 

IND  =  IN0+1 
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120 


130 
135 
140 


P<  IN0)=CPP( I ) 

IND*IND+1 

P(  IND)=C0NJG(CPP( I)  ) 

CONTINUE 

IF     (IRP.EQ.OJ    GO    TO    135 

DO    130    1=1, IRP 

INC=IND+1 

P(INO)=CRPU) 

CONTINUE 

IF(IWR.EQ.l):    WRITE(6tl40J    INO 

FORMATdOX,1  RECCN    PGLES',14) 

IF  (IXPLT.NE.5)    RETURN 


ITEXT    =    3 

00    150    1=1,  IP 

XX    =    2.5    *    REALl P(I ) ) 

YY    =    2.5    *    AIMAG(P< I) ) 

CALL    SYMBOL(XX,YY,HEIG,  ITEXT,  AN  G,  NC  1 
150         CONTINUE 
IPEN    =    -3 
XX    =    5.0 
YY    =    -10. 
CALL    PLOT 
RETURN 
END 


(XX, YY,  IPEN) 
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SUBROUTINE    NEWCF     <IP,P,/MWR) 
C 

C      DETERMINES    THE    COEFFICIENTS    OF    THE 
C       PREDICTOR    POLYNOMIAL    FROM    THE    ROOT    OF 
C      ThE    CHARACTERISTIC    EQUATION 
C 

C       IP    =    ORDER    OF    THE    POLYNOMIAL 
C 

C       P    =   COMPLEX    ROOTS    OF    CHARACTERISTIC    EGN 
C  I.E.    POLES    OF    THE    FILTER 

C 

C      A     =    ARRAY    OF    REAL    COEFFICIENTS 
C 

C       IWR    =    1    FOR    PRINTING    DURING   SUBROUTINE 
C 

C       IF    ALL    COMPLEX    ROOTS    ARE    IN   CONJUGATE    FAIRS 
C       ALL    OF    THE    COEFFICIENTS    SHCULO    BE    REAL 
C       THIS   CAN    BE    CHECKED    WITH    OUTPUT 
C 

C0NFLEX*16    PP(14)f  AA(l<t) 

COMPLEXES    PUP) 

REAL*4    A( IP ) 

I    =0.0 

DO     10    I    =    I. IP 

AA(  I)    =    P  (I  J 

P<MI)     =    P(I) 
10  CONTINUE 

K    =     I? 

M    =    IP-1 

DO    40    L   »    It  M 

00   30    I    =    2fK 
AA( I)     =    AA(I )*AA(I-1) 
3  0  CONTINUE 

K    =    K--  ] 

DO    20    I    =■    1,K 
J    =    H-L 

4AUI     =    PP(J)*AA(II 
20  CONTINUE 

40  CONTINUE 

K    =    IP/2 

K    =    2*K/( IP-K ) 

DO    50    I    =    K,IF,2 

AA(  I)     =    -AAU  ) 
50  CONTINUE 

DO    60    I     =    1, I F 

J    =    IP«-1-I 

A( J )    =REAL(AA  (I  )) 

PP{ J)     =    AAU) 
60  CONTINUE 

IF  (1WR.NE.1  i    RETURN 

WRITE(6t70)     (  (  I  *PP(  I  )  )  v  I    »   1,1  PI 
70  FORMAT (/'     RECONSTRUCTED    POLYNOMIAL    COEFFICIENTS'/ 

*  20X, • IMAGINARY    TERMS    SHOULD    BE    ZERO'/ 

*  ( lX,I5,Fia.8t=18.4)  ) 
RETURN 

END 
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SUBROUTINE    RECON( A , I P ,RMS ,1 VF  ,IPP,N,S) 
C 

C       RECONSTRUCTS    SPEECH    SAMPLES    FROM    LPC    COEFP,     ETC 
C 

C      A       =       VECTOR    OF    LPC    CCEFF 

C       IP       =       NUMBER    OF    COEFP     (ORDER    OF    FILTER) 
C       RMS      =      RMS    VALUE    OF    ERROR    SIGNAL 
C       IVF       =0    UNVOICED 
C  =1    VOICED 

C       IPP       =       PITCH    PERIOD     IN    NUMBER    OF    SAMPLES 
C       N       =       SAMPLES    PEP    FRAME 
C      S       =       SAMPLE    VECTOR     (OLTP'JT) 
C 

DIMENSION    A(l ),S(1) ,X(270) ,XX ( 14 ) , AC (14 ) 

CAT A    XX,RMSO, I  SEED , I VF 0/1 5*0. 0,122  4,0/ 

DO    10    I    =    1,1  F 

X(  I)     =    XXU  ) 
10  CONTINUE 

M  P    =    N+I  P 

N5    =    1+1? 
C 

C       IF    CURRENT    PULSE    UNFINISHED    DON'T    CHANGS    COEFF    YET 
C 

IF(  IVFO.NE.O  )    GO    TO    400 
C 

C   UPDATE  COEFF 
C 
100    RMSO  =  SQRT(RVSO*RMS) 

IF(IVF.EQ.O)  RMSO  =  RMS 

IF(RMSO.LT. (RMS/2.0)  )  RMS  0= RMS/ 2.0 

DO  10  5  I  =  1,IP 

AO(I)  =  A(I  ) 
105    CONTINUE 

IVFC  =  IVF 

iPPO  =  IPP 

r 

C   TEST  IF  VOICED 
C 

IF( IVFO.NE.O)  GO  TO  300 
C 

C       RECCNSTRUCT    UNVOICED    SPEECH 
C 
200  E    =    RMSO*GGNOF(ISEED) 

CO    210    I    =1, IP 

NSMI    =    NS-I 

E    =     £-A( I  )*X(NSMI  ) 
210         CONTINUE 

X(NS  )    =    E 

IF(NS.Gc.NIP)     GO   TO    600 

NS    =    NS  +  1 

GO    TO    200 
C 

C       START    VOICED    PULSE 
C 
300         NP    =    1 

EX    =RMSO*SQRT(FLOAT(IPPO) ) 
C 

C       TEST    FOR    BEGINING    OF    PULSE    PERIOD 
C 
400         IF (NP.GT. IPPO)    GO    TC    100 

E    =    CO 

IF  (NP.EO.  1)     E    =   -EX 
C 

C   RECONSTRUCT  VOICED  SPEECH 
C 
500    CO  510  I  =  1 ,IP 

NSMI  =  NS-I 

E  =  E-A( IJ*X(NSMI ) 
510    CONTINUE 

NP  =  NP+1 

X(NS)  =  E 

IF( NS.GE.NIP)  GO  TO  600 
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NS    =    NS+1 

GO    TO    400 
C 

C       SAVE    VALUES    ANO    PREPARE   OUTPUT 
C 
600         DO    610    I    =    1,  IP 

XX( I)    =    X(  N+I  ) 
610         CONTINUE 

DO  620  I  =  1,N 

S( I  )  =  X(  I+IP) 
620    CONTINUE 

RETURN 

END 


SUBROUTINE    RMS    (X,N,VAL) 
C 

C       DETERMINE    THE    RMS    VALUE    OF    A    S  ET    OF    CAT  A 
C 

C      X    =    VECTOR    OF   -INPUT    SAMPLES 
C       N    =    NUMBER    OF    SAMPLES 
C       VAL    =    RMS    VALUE    RETURNED 
C 

DIMENSION    X(l) 

VAL    =    0.3 

DO    10    I    =    1,N 

VAL    =    VAL+X( I )**2 
10  CONTINUE 

VAL    =    SQRT(VAL/FLOAT(N) ) 

RETURN 

ENC 
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SUBROUTINE    WINOW(  XfY»Nf  IWIN  I 

C 

C      X    =    VECTOR    OF    UNWINDOWED    SAMPLES 

C      Y     =    VECTOR    OF    WINDOWED    SAMPLES    (OUTPUT) 

C      N    =    NUMBER    OF    SAMPLES 

C       Iw IN    =    TYPE    OF    WINDOW 

C      0     =    RECTANGULAR       (COPY    ONLY) 

C       1    =    HAMMING     (ALPHA    =    0.54) 

C      2    =    BARTLETT 

C      3    =    BLACKMAN 

C       4    =    HAISNING 

C 

DIMENSION    X(1),YQ) 

CAT  A    PIfTWOPI  ,FOR  PI /3.  141  59  26,  6.  23  31  853, 12.  5  663  71/ 

IF(  IWIN. LT.O.CR. IWIN. GT .4)    GO    TO    999 

AN    =    FLOAT(N) 

GO    TO    (110,210,310,410) ,IWIN 
C 

C       RECTANGULAR    WINDOW      COPY    VECTOR 
C 
10  CO    20    1=1  ,N 

Y(  I  )    =    X(  I) 
20  CONTINUE 

RETURN 
C 

C       HAMMING    WINDCW 
C 
110         00    120    1  =  1,  N 

A  J    =    FLOAT     (1-1  J 

YU)    =    X(  I  )*(  C.  54-0.46*C0S(  TWOPI *  A J/ ( AN-1  .0)  )  ) 
120  CONTINUE 

RETURN 
C 

C       BARTLETT    WINOOW 
C 
210         NN    =   N/2 

NNN    =    NN+1 

00    220    1=1, NN 

AJ    =    FLOAT  (•!-!) 

Y(I)    =    X(I)*2.0*AJ/  (AN-1.0) 
220         CONTINUE 

00    230    I=NNN,N 

A J    =    FLOAT(I-l) 

Y(  I  )    =    X(  I  )*2.O*(1.0-AJ/(AN-1.0)  ) 
230  CONTINUE 

RETURN 
C 

C       BLACK  MAN    WINOOW 
C 
310         CO    320    I=1TN 

AJ    =   FLOATU.-li 

Y(I)    =    X( I)*( 0.42-0. 5*CGS(TW0PI*AJ/(AN-1. 0) ) 
*  *0.08*CCS( FORPI*AJ/ (AN-l.C)  )  ) 

320         CONTINUE 

RETURN 
C 

C       HANNING    WINOOW 
C 
410         CO    420    I=1TN 

AJ    =    FLOAT(I-l) 

Y(I)    =    X(  I)*0  .5*(1  .0-COS(TWOPI*AJ/<  AN-1.0)  )  ) 
420         CONTINUE 

RETURN 
999  WRITE(6,998) 

998  F0RMAT(//10X,'**    ERRCR    SUBR    WINOOW    **<//) 

STOP 

END 
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SUBROUTINE  VPLTIN  (N) 
C 

C   SUBROUTINE  CREATES  A  VERSAPLOT  GRAPH  CF  60  FRAMES 
C   OF  VOICE  SAMPLES  (128  SAMPLES  f    FRAME) 
C 

C   CALL  VPLTIN  TO  INITIALIZE  EACH  PLOT 
C 

C   CALL  VPLT  FOR  EACH  FRAME 
C 

C  N=NUMBEP    CF    SAMPLES    PER    FRAME 

C  X=VECTOR    OF    SAMPLES 

C 

C      CALLING    PROGRAM    SHOULD    ISSUE 
C  CALL    PLOTU,  Y,999) 

C      TO    COMPLETE    PLOTTING 
C 

DIMENSION    X(768), Y( 256) ,X0(8)  ,Y0(8) 

DATA    XO/0  .0,0  .0,7 .0  ,0.0 ,7 ,0,0. 0,7.0 1 0.0/ 

DATA    YO/10.  ,-10.,  10.  ,-lC.  ,10.  ,-10 . , 10. ,-1 0 ./ 

DO    10    1=1,768 

X( I )=FLOAT(I J/128. 0 
10  CONTINUE 

CALL    PLOTS  CIA,  I  B,  I  C  ) 

NPEN=2 

CALL    NEWPEN(NPEN) 

NPLT=1 

I  PEN    =    -3 

CALL    PLOT     (XO(NPLT) ,YOt NPLT ) ,IPEN) 

IPEN=2 

IX=768 

IY=11 

RETURN 
C 
C 

ENTRY    VPLT(Y) 

DO    100    1=1, N 

IX=IX+1 

IF( IX.LE.768)    GO    TO    50 

IX  =  1 

IY=IY-1 

YS=2.0+0.7*FLCAT(I Y) 

IF(  IY  .GE  .1)    GO   TO    40 

NPLT  =  NPLT+i 

IPEN=-3 

CALL    PLOT (XO(NPLT), YO(NPLT) ,IPEN) 

IPEN=2 

IX=1 

IY=10 

YS=2.0+0.7*FLCAT(IY) 
40  IP  EN  =3 

YY=Y(  D/100.0+YS 

CALL    PLOT(XdX)  ,YY,IPEN) 

IPEN=2 

GO    TC    100 
50  YY=V(I) /100.0+YS 

CALL    PLOT  (X(IX)vYYf  IPEN  ) 
100         CONTINUE 

RETURN 

6NC 
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APPENDIX    A. 3 


POWER    SPECTRAL    DENSITY    ANALYSIS    AND 
PLOTTING     FROGRAM 


9 
10 


20 
25 


30 


90 


DIME 

READ 

FORM 

IF(I 

DO    9 

REAL 

CONT 

M=8 

READ 

CALL 

CALL 

K  =  0 

READ 

FORV 

CALL 

IFU 

FORN 

K=K+ 

CALL 

IF(K 

GO    T 

IP  EN 

CALL 

STOP 

END 


NS  I  ON    X(2  56) 

<5,8,END=9Q>     I NUM, I  SKI P ,1  WIN 

£T<3I5) 

SKIP.EQ-0  )    GO    TO    10 

1=1,  I  SKIP 
(2,25,END=90)    X 
INUE 

(2,25  ,END=90)    X 
PSDINTt  XtM) 
SPLINT 


(2,25  ,END=90)     X 
AT  (128A4) 

PSD( X,M,IWIN) 
•  LE.6)     WRITE<6,20) 
AT(1X,10F12.5) 
1 

SPL     (X) 
•GT.INUM)     GO    TO    90 
0    20 
=  999 

FLOT(AX,Y,IPEN  ) 


(X( J),J  =  1,  126) 
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SLBROUTINE    SPLINT 
C 

C      SUBROUTINE     PLOTS    THE    POWER    SPECTRAL    DENSITY 
C       (LOG    OP    MAGNITUDE)     FOR    128    FREQUENCIES    WHICH 
C       IS     INPUT     IN    MAGNITUDE    FORM    IN    VECTOR     Y 
C 

C      VALUES    IN    Y    SHOULD    BE     BETWEEN    O.Ol    AND    100.0 
C 

C      CALL    SPLINT  TO    INITIALIZE   PLOTTING 

C 

C      CALL    SFL    (Y)  FOR    EVERY    SET    OF    123    PSD    VALUES 

C 

C      CALLING    PROGRAM    SHOULD    ISSUE    CALL    PLOT    (X,Y,999) 
C       WhEN    PLOTTING   iS    COMPLETE 
C 

c 

DIMENSION    Yfil  ),XQ28),YYQ28  > 

DIMENSION    R0PGX(6)  ,R0RGY<6)  ,GX( 19 )  ,GY( 19 > , I GP (1 9 ) 

C 

C   DATA  FCR  SIX  PLOT  ORIGINS 

C 

CAT*    RORGX/0. 1,-1.2,-1.2,3.  8,  -1.2,  -1.2/ 
DATA    RORGY/0.5,4.0,4.0,-17.0,4.0,4.0/ 

C 

C       DATA    TC    PLOT     AXIS 


C 


CAT  A    GX/ 7.  5, 7.5, 6. C, 6. 0,4. 5, 4, 5, 3. 0,2.0,1 .5, 


*  1.5,0.0 ,0.0,-0.1 ,0.0,-0.1, 0.0,-0.1,0.0,-0.1/ 
DATA    GY/0.0,-C.l,0.  0,-0.1  ,0.0  ,-0.1  ,0.0,-0.1  , 

*  0.0,-0.1,-0.1,0.80Q,C.300,0.60  0,0.600,0.4,C4, 

*  C. 200,  0.200/ 

DATA    I GP/2, 2, 2, 2, 3, 2, 3, 2,3,2,3,2,2,2,2,3,2,3,2/ 

00    10    1=1,128 

X(  I)=Fi_OAT(  I-i  3*0.05  85  9-0.  04 
10  CONTINUE 

CALL    PLOTS     (IA,IS,IC) 

If=LAG  =  0 

IPLTN=1 

I5CAN=0 
15  IPEN=-3 

CALL    PLOT  (RORGX<IPLTN),RORGY(  IPLTN  )  ,  IPEN) 

N?EN=4 

CALL    NEWPEN(NPEN) 

00  30    1=1  ,19 

CALL    PLOT  (GX(I  )  ,GY(  I)  ,IGP(D) 
30  CONTINUE 

NPEN=2 

CALL    NEWPEN(NPEN) 

RETURN 
C 
C 

ENTRY    SPL     (Y) 

1  SCAN    =    I  SCAN    +    1 
C 

C       RETURN     IMMEDIATELY    IF     PLOT    FULL 
C 

IF( IFLAG.EQ.l)    RETURN 
C 

C      CONVERT    DATA    TO    LOG    PLOT 
C 

DO    50    1=1,128 

YTEM=Y<  I  ) 

IF  (YTEM.LT.O  .100)    YTEM=C.10G 

YY(I)=0.10+0.2000*ALCG10(YTEM) 
50  CONTINUE 

IP EN=-3 

XSCANO=0. 04 

YSCANQ=0.  1 

CALL    PLCT     (XSCANO,  YSCANO,  IPEN  ) 

IPEN=3 

CALL    PLOT  (X(l  ),YY(1  ),  IPEN  ) 

IPEN=2 
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DO     60    1  =  2,128 

CALL    PLOT     (X(I),YY (  I),  I  FEN ) 
60  CONTINUE 

IF  (ISCAN.LE.29)    RETURN 

ISCAN  =  0 

IPLTN=IPLTN+1 

IF{  IPLTN.LE.6  )    GO    TO    15 

IFLAG=1 

RETURN 

ENC 
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SUBROUTINE  PSCINT(X,M) 
C 

C  POWER    SPECTRAL    DENSITY    BASED    ON    ALGORITHM 

C  PRESENTED    BY    C    M    RACER     IN »     •  AN     IMPRCVEO 

C  ALGORITHM    FOR    HIGH     SPEED    AUTOCORRELATION 

C  WlTh    APPLICATION    TO    SPECTRAL    ESTIMATION,' 

C  IEEE    TRANS    AUDI 0,E LECTR CACOUSTICS ,    V    AU-18,DEC70 
C 

C  X    =    VECTOR    OF    INPUT    SAMPLES 

C  M    =    POWER    OF    2    FOR    NUMBER    OF    SAMPLES 

C  IWIN    =    0    NO    WINDOW 

C  1     HAMMING     (ALPHA    =    0.54) 

C  2    BARTLETT 

C  3    BLACKMAN 

C  4    HANNING 

C 

C  FIRST    CALL     IS    TO    PSDINT    AND    THEN    EACH    SUCESSIVE 

C  CALL    FOR    THAT    STRING    OF    DATA    SHOULC    8E   TO    PSD 

C  TO    START    A    FRESH    STRING    OF    DATA    CALL    PSDINT    AGAIN 
C 

c 

DIMENSION    X(256),  I  WK(ll) 

COMPLEX    XN<512)  ,XNP(512  ),  YNM512  >,  AK512) 

DATA    XN,XNP/1C24*{  C.0,C0)  / 

MM    =    M+l 

N=2**M 

NN=2*N 
C 

C  SPECIFY    COEFFICIENTS    NEEDED    IN    ADDITION 

C  OF    NEXT    X(F)     VECTOR    TO    CURRENT    X(F)     VECTOR 

C  TO    MAKE    Y(F)    VECTOR.    IN    BINARY    REVERSE    ORDER. 

c    • 

NNN    =    NN-1 

DO    90    1=1, NNN, 2 

Aid)    =    (1.0, CO) 

11    =    1+1 

AI(  II J  =  (-1.0,0.0) 
90     CONTINUE 

CALL  FFRDR2  (AI,MM, IWK) 

AIMG  =  0.  0 

DO  101  I  =  XtN 

XN(  I)  =  CMPLX(X(I  ),  AIMG) 
101    CONTINUE 
C 

C      FFT  OF  CURRENT  X(T)  VECTOR, LAST  HALF  ZERO. 
C 

CALL  FFT2  (XN,MM,IWK) 

RETURN 
C 

C      USE  THIS  ENTRY  FOR  EACH  FRAME  AFTER  FIRST 
C 

ENTRY  PSD  (X,M,IWIN) 

MM  =  M+l 

N  =  2**M 

NN  =  2*N 

AN  -    FLOAT  (,N) 

ANN  =  FLOAT  ( NN ) 

4IMG  =  0.0 

DO  110  I  =  1 ,N 

XNP(  I)  =  CMPLX(X(  I  ),AIMG) 
110    CONTINUE 
C 

C      FFT  of  NEXT  X(T)  VECTOR, LAST  HALF  ZERO. 
C 

CALL  FFT2  (XNP,MM,IWK) 
C 

C  FORM    Y(F)     VECTCR,CCEFF    IN    REV     BINARY    ORDER. 

C 

CO    120    I    =   l.NN 

YN(I)    =    (XN(I)+AI<I)*XNP(IM*CONJG(XNP(II) 
120         CONTINUE 

CO    123    I     =    1,ISN 
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c 
c 
c 

123 

C 

c 

c 


143 


153 

C 

c 
c 


160 


FORV    CONJG   TO    PREFORM     INV    DFT 

YN(  I)    =    CONJG    (YNId  )) 
CONTINUE 

INV    FFT    OF   Y(F)    GIVES    RXX(TAU) 

CALL    FFT2RV    ( YN,MM,I WK) 

00    143    I    =    1»NN 

YN(  I)    =    CCNJG     (YN(I  )  )  /  ANN 

CONTINUE 

CALL    WIND2     <YN,N, IWIN) 

CALL    FFT2     (YN,M,IWK) 

CALL    FFRDR2     (YN,M,IkK) 

CO    153    I    =    It N 

X(I )    =    CABS    (YN(I) ) /(AN**2> 

CONTINUE 

MOVE    NEXT    X(f=)    INTO    CURRECT    X(F) 

DO    160    I     =    1»  NN 
XNU)     =    XNP(I) 

XNPd  )    =     (0.0,0.0  ) 

CONTINUE 

RETURN 

ENC 


50 

100 


190 

200 


290 
200 


390 
400 


490 


(  (TWOPI*AJ)/( AN-1.01  ) 


SUBROUTINE  WIN02    (B,N,IWIN) 

COMPLEX    B<512  ) 

OAT  A    PI, TW3PI/3. 141 5926,6.283185/ 

M    =    FLGAT(N) 
GO    TO     (200,300,400,100)  , I 
SETURN 

00    190    I     =    1  ,  IS 
AJ    =    FLOAT(I-l) 

F    =    0.5*(1.0-C0S     ( (TWQPI+AJ )/ (AN-1.0) ) ) 
B(  I  )    =    B(I)*F 
CONTINUE 
RETURN 

DO    290    I     =    1 ,N    , 
\J    =    FLOAT(I-l) 
F    =    0.54-0.46*COS 
3(1  )    =    B(  I)*F 
CONTINUE 
RETURN 

DO    390    I    =    1,N 
A  J    =    FLCATU-1) 
IF(I.LE.(N/2) )     F 
1F(  I.GT .(N/2)  ) 
e(I )     =    B( I)*F 
CONTINUE 
RETURN 

DO    490    I     =    1  ,N 
AJ    =    FLOAT(  1-1) 
F    =    0.42-0.5*C0S 
*      C0S(4.0*PI*AJ/(  AN 
3(  I  )    =    B( I)*F 
CONTINUE 
RETURN 
ENC 


F    = 


2. Q*AJ/( AN-1.0) 
2.0-2.G*AJ/(  AN-l.C) 


(TWGPI*AJ/  (AN-l.G  )  J+O.OS* 
1.0)  ) 
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APPENDIX    A.4  NJNE   TRACK   TO    SEVEN    TRACK    TAPE    CONVERSION 

PROGRAM 


DIMENSION    DATQ024),IDATUG24) 

FACT0R=<2.0**23)/25  0.0 

HTEST=2**23-1 

LTEST=-HTEST 

NFILES=6 

REWIND    2 

RtW IND   4 

N=1C24 
C 

£0    200    I=1,NFILES 

WRITE(6,11)    I 
11  FORMAT(  'IFILE', 14) 

C 

00    100    J=l ,50 

READ( 2,I5,END=19C,ERR=30)     DAT 

15  FCRMAT<128A4) 
GO    TO    50 

30  WRITE(6,21) 

21  FORMAT (60X,« READ    ERROR') 

50  ^iRITE(  6,16)    J 

16  FORMAT  UOX,  'RECORD    HAS    BEEN    READ', 14) 
IF(J.EQ.l)     WRITE(6,17)     DAT 

17  FORMAT ( IX, 10F12.3) 
C 

DO    80    K=l,1024 

IDAT(K)=IFIX(DAT(K)*FACTQR) 

IF{  IDAT(K)  .GT.HTEST)     WRITE{6,18)     I,J,K 

18  FORMAT( 4QX,' TCG    LARGE    FILE', 14,'     RECORD', 14. 

*  •     ITEM',  14) 

IF( IDAT(K).LT.LTEST)     WRITE(6,19)     I,J,K 

19  FORMAT* 40X, 'TOO    SMALL    FILE', 14,'     RECORD', 14 

*  '     4TEM',  14  ) 
80                           CONTINUE 

C 

IFU.EC.l)    WRITE(6,20)     IDAT 

20  FORMATCLX, 10112) 
CALL    MORF(  IDAT , N  ) 
WRITE(4,25)    IDAT 

25  F0RMAT(128(  8A4)  ) 
100  CONTINUE 

C 

WRITE(6,26) 

26  F0RNAT(5X,'  ALL    50    RECORCS    READ') 
155         R5AD(2,15,EN0=190)     DAT 

GO    TO   155 
190         WRITE(6,27) 

27  F0RMAT(2X, 'END    OF    FILE') 
ENCFILE   4 

20C         CONTINUE 
C 

STOP 

END 


99 


APPENDIX  B.l    COMPUTER  ANALYSIS  AND  MODIFICATION  OF  VOICED 

SPEECH 


The  15  frame  (  3 3 1+  msec.  )  segment  of  speech  analyzed  in 
this  appendix  is  the  "long  e"  sound  (as  in  need)  and  is 
spoken  by  a  woman.   The  process  illustrated  shows  both 
direct  reconstruction  and  reconstruction  with  the  pitch 
reduced  by  a  factor  of  0.58  and  the  formant  frequencies 
reduced  by  a  factor  of  0.88. 


^M/A^VH^AAVHAVAn/W^VW^ 


Figure    B.l.l      WAVEFORM    OF     INPUT    SPEECH 
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Figure    B.1.4      WAVEFORM    OF    ERROR    SIGNAL 


Figure    B.1.5      WAVEFORM    OF    FILTERED    ERROR    SIGNAL 


10  7 


ff\ 


(U 

E 

03 

i_ 

c 

Ll_ 

o 

C/3 

4-J 

•z. 

03 

o 

U 

h- 

M_ 

< 

o 

"O 

o 

o 

—1 

2 

uu 

s- 

_J 

(U 

o 

4-J 

0. 

< 

1— 

er 

1 

< 

cc 

+ 

1— 

_l 

< 

C 

C_) 

o 

o 

> 

4-1 

03 
O 

•"^"N 

03 

4- 

>>•*' 

U3 

T3 

• 

O 

rH 

S 

• 

cn 

0) 

0) 

o 

L. 

4- 

3 

<u 

bO 

CQ 

%l 

LL. 

i 

X 

<v 

E 

03 

i_ 

108 


<D 

E 

ft! 

V. 

c 

LL. 

o 

CO 

-M 

■z. 

(T3 

o 

U 

H- 

4- 

< 

._ 

CJ> 

Tj 

o 

o 

_j 

s: 

LU 

L. 

_l 

oj 

O 

■u 

a. 

4- 
< 

H- 

O 

1 

< 

Cd 

+ 

1— 

-J 

< 

c 

o 

o 

o 

.— 

> 

S—*\ 

.— 

-Q 

u- 

— ' 

. 

l£> 

T3 

• 

o 

i— 1 

s 

• 

CO 

0) 

CU 

o 

i_ 

M- 

3 

tu 

bC 

CQ 

•^ 

-* 

u_ 

1 
X 

<D 

E 

(T3 

i_ 

109 


Figure  B.1.7   WAVEFORM  OF  UNMODIFIED  OUTPUT  SPEECH 


Figure  B.1.8   WAVEFORM  OF  MODIFIED  OUTPUT  SPEECH 
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APPENDIX  B.2    COMPUTER  ANALYSIS  AND  MODIFICATION  OF 

UNVOI CED  SPEECH 


The  15  frame  (  384  msec.  )  segment  of  speech  analyzed  in 
this  appendix  is  the  "sa"  sound  (begining  of  salt)  and  is 
spoken  by  a  woman.   The  process  illustrated  shows  both 
direct  reconstruction  and  reconstruction  with  the  pitch 
reduced  by  a  factor  of  0.58  and  the  formant  frequencies 
reduced  by  a  factor  of  0.88. 
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Figure    B.2.1      WAVEFORM    OF    INPUT   SPEECH 
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Figure    B.2.U      WAVEFORM    OF    ERROR    SIGNAL 


Figure    B.2.5      WAVEFORM    OF    FILTERED    ERROR    SIGNAL 
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Figure    B.2.7      WAVEFORM    OF    UNMODIFIED    OUTPUT   SPEECH 
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Figure    B.2.8      WAVEFORM    OF    MODIFIED    OUTPUT   SPEECH 
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APPENDIX  C   DESCRIPTION  OF  VOICE  TAPE 

The  audio  recording  which  is  available  from  the  author  has 
four  sections  each  of  which  contains  three  segments  of 
speech.   These  three  speech  segments  are   of  the  following 
sounds : 

Segment  1  -  Five  long  vowels. 

"a  e  i  o  u" 

Segment  2  -  Four  words  which  are  combinations  of 
fricatives  and  voiced  sounds. 

"sat  free  hip  done" 

Segment  3  -  A  sentence  with  a  varity  of  sounds. 

"Every  salt  breeze  comes  from  the  sea." 

Each  of  these  segments  is  repeated  in  each  segment  of  the 

tape.   Each  section  of  the  tape  shows  the  effects  of  a 

different  step  in  the  processing. 


Section  1  -  Unprocessed  speech,  the  recording  used 
for  input  to  the  processing  system. 

Section  2  -  Speech  which  has  been  converted  to 
digital  form  and  then  converted  back  to  analog 
form. with  no  other  processing. 

Section  3  -  Speech  which  has  been  encoded  into  a 
set  of  LPC  parameters  and  then  decoded  using  the 
same  parameters  (i.e.  no  modification). 

Section  k    -   Speech  which  has  been  encoded  into  a 
set  of  LPC  parameters  and  those  parameters  altered 
to  reduce  the  pitch  frequency  by  a  factor  of  0.56 
and  to  reduce  the  formant  frequencies  by  a  factor 
of  0.88.   The  same  LPC  decoding  process  is  then 
used  to  reconstruct  the  speech  segment. 
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